Digital Analysis of Molecular Analytes Using Single Molecule Detection

ABSTRACT

Methods and systems are provided for small molecule analyte detection using digital signals, key encryption, and communications protocols. The methods provide detection of a large numbers of proteins, peptides, RNA molecules, and DNA molecules in a single optical or electrical detection assay within a large dynamic range.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.61/869,020. The entire teachings of the above application areincorporated herein by reference for all purposes.

BACKGROUND

1. Technical Field

This disclosure relates to the fields of diagnostics and communicationstheory, and specifically relates to methods for digital analysis ofmolecular analytes.

2. Description of the Related Art

Multiple molecular and biochemical approaches are available formolecular analyte identification and quantification. Examples includecommonly used nucleic acid based assays, such as qPCR (quantitativepolymerase chain reaction) and DNA microarray, and protein basedapproaches, such as immunoassay and mass spectrometry. However, variouslimitations exist in current analyte analysis technologies. For example,current methods have limitations of sensitivity, especially whereanalytes are present in biological samples at low copy numbers or in lowconcentrations. Most of the nucleic acid quantification technologiesinvolve sample amplification for higher sensitivity. However,amplification techniques introduce biases and inaccuracies into thequantification. Moreover, amplification is not possible for protein andpeptides. Due to lack of sensitivity, approaches for detection andquantification often require relatively large sample volumes. Currentmethods are also limited in their capacity for identification andquantification of a large number of analytes. Quantification of all ofmRNA and proteins in a sample requires high multiplexity and largedynamic range. In addition, current technologies lack the capability todetect and quantify nucleic acids and proteins simultaneously.

Current methods often generate errors during analyte detection andquantification due to conditions such as weak signal detection, falsepositives, and other mistakes. These errors may result in themisidentification and inaccurate quantification of analytes.

Therefore, methods and systems are needed for analyte analysis thatallows for high sensitivity with small sample volume, high multiplexity,large dynamic range and the ability to detect protein and nucleic acidmolecules in a single assay. More importantly, methods of errorcorrection to correct for analyte detection errors are needed. Thepresent invention addresses these and other limitations of the prior artby introducing sensitive single molecule identification andquantification of biological analytes with a digital readout.

BRIEF DESCRIPTIONS OF THE DRAWINGS

The disclosed embodiments have other advantages and features which willbe more readily apparent from the following detailed description of theinvention and the appended claims, when taken in conjunction with theaccompanying drawings, in which:

Figure (or “FIG.”) 1 is a high-level block diagram illustrating anexample of a computer, according to one embodiment of the invention.

FIG. 2A illustrates an example of a probe comprising an antibody and adetectable tag, where the probe binds a target protein, according to oneembodiment of the invention.

FIG. 2B illustrates an example of a probe comprising a primary antibodyand a secondary antibody conjugated to detectable tag, according to oneembodiment of the invention.

FIG. 3 shows a target analyte bound to a probe comprising an aptamer anda tail region, according to one embodiment of the invention.

FIG. 4 shows a fluorescent tag attached to a probe comprising an aptamerand a tail region, according to one embodiment of the invention.

FIG. 5 shows an example of a probe comprising an antibody linked to aregion that can hybridize to a tail region, according to one embodimentof the invention.

FIG. 6 illustrates an example of a probe comprising a primary antibodyand a secondary antibody conjugated to a tail region, according to oneembodiment of the invention.

FIG. 7 shows an example of a solid substrate bound with a samplecomprising analytes (e.g., proteins, DNA and/or RNA), according to oneembodiment of the invention.

FIG. 8 shows an example substrate (10×10 array) for binding analytes,according to one embodiment of the invention.

FIG. 9 is a top view of a solid substrate with analytes randomly boundto the substrate, according to one embodiment of the invention.

FIGS. 10A-10D illustrate an example of sixteen target proteins arrangedon a substrate. FIGS. 10B and 10C depict examples of images of thesubstrate after contact with different probe pools, according to oneembodiment of the invention.

FIG. 11 illustrates an example Reed-Solomon error correction structure,according to one embodiment of the invention.

FIG. 12A illustrates an example substrate divided into three regionsdepicting target analyte concentration levels, according to oneembodiment of the invention.

FIGS. 12B-12C show graphs of target analyte abundance ranges, accordingto one embodiment of the invention.

FIG. 13 illustrates an example detection assay using a substrate andfour analytes using a single color fluorescent tag, single pass, darkcounted, and 1 bit per cycle, according to one embodiment of theinvention.

FIG. 14 shows an example detection assay using a substrate and fouranalytes using a single color fluorescent tag, four passes per cycle,dark cycle not counted, and 2 bits per cycle, according to oneembodiment of the invention.

FIG. 15 shows color sequences and IDs for target analytes, and showsscanning results of the target analytes for the probing, binding, andstripping cycles, according to one embodiment of the invention.

FIG. 16A shows the numbers of specific target analytes identified ondifferent portions of a substrate, according to one embodiment of theinvention.

FIG. 16B shows color sequences and IDs for target analytes, according toone embodiment of the invention.

FIG. 17 is an image of single fluor probes hybridized to target analytesbonded to a substrate, according to one embodiment of the invention.

FIG. 18 illustrates examples of identification of various targetanalytes using single fluor detection, according to one embodiment ofthe invention.

FIG. 19 shows color sequences and IDs for two target analytes, and showsscanning results of the target analytes for the probing, binding, andstripping cycles, according to one embodiment of the invention.

FIG. 20 is an image of single molecule peptides bound to a substrate,hybridized with conjugated antibodies, according to one embodiment ofthe invention.

FIG. 21 shows a probability plot of estimated concentrations of proteinsfrom the UniProt database, according to one embodiment of the invention.

FIG. 22 shows a list of estimated values for various abundance regionsof a substrate, according to one embodiment of the invention.

FIG. 23 is a simulated image of protein identification across anyabundance range, according to one embodiment of the invention.

FIG. 24 illustrates a graph of system error rate vs. raw error rate foridentifying target analytes, according to one embodiment of theinvention.

SUMMARY OF THE INVENTION

The invention provides systems and methods for detecting a plurality ofanalytes, comprising: obtaining a plurality of ordered probe reagentsets, each of the ordered probe reagent sets comprising one or moreprobes directed to a defined subset of N distinct target analytes,wherein the N distinct target analytes are immobilized on spatiallyseparate regions of a substrate, and each of the probes is detectablylabeled. The method also includes steps for performing at least M cyclesof probe binding and signal detection, each cycle comprising one or morepasses, wherein a pass comprises use of at least one of the orderedprobe reagent sets. The method comprises detecting from the at least Mcycles a presence or an absence of a plurality of signals from thespatially separate regions of the substrate.

The method includes determining from the plurality of signals at least Kbits of information per cycle for one or more of the N distinct targetanalytes, wherein the at least K bits of information are used todetermine L total bits of information, wherein K×M=L bits of informationand L>log 2 (N), and wherein the L bits of information are used todetermine a presence or an absence of one or more of the N distincttarget analytes.

In some embodiments, L>log 2 (N), and L comprises bits of informationfor target identification. In other embodiments, L>log 2 (N), and Lcomprises bits of information that are ordered in a predetermined order.

In one embodiment, the predetermined order is a random order. In anotherembodiment, L>log 2 (N), and L comprises bits of information comprisinga key for decoding an order of the plurality of ordered probe reagentsets.

The method also includes digitizing the plurality of signals to expand adynamic range of detection of the plurality of signals. In someembodiments, the at least K bits of information comprise informationabout the number of passes in a cycle. In another embodiment, the atleast K bits of information comprise information about the absence of asignal for one of the N distinct target analytes.

In one embodiment, the detectable label is a fluorescent label. Inanother embodiment, the probe comprises an antibody. In one embodiment,the antibody is conjugated directly to a label. The antibody can also bebound to a secondary antibody conjugated to a label. In otherembodiments, the probe comprises an aptamer. In one embodiment, theaptamer comprises a homopolymeric base region. In other embodiments, theplurality of analytes comprises a protein, a peptide aptamer, or anucleic acid molecule.

The method can include detecting from the at least M cycles a presenceor an absence of a plurality of optical signals. The method can alsoinclude detecting from the at least M cycles a presence or an absence ofa plurality of electrical signals.

In one embodiment, the method is computer implemented. In anotherembodiment, K is one bit of information per cycle. In other embodiments,K is two bits of information per cycle. K can also be three or more bitsof information per cycle.

In another embodiment, the method includes determining from the L bitsof information an error correction for the plurality of output signals.The error correction method can be a Reed-Solomon code.

In one embodiment, the method comprises determining a number of orderedprobe reagent sets based on the number of N distinct target analytes.The method can also include determining a type of probe reagent setsbased on the type of N distinct target analytes.

In an embodiment, the N distinct target analytes are present in asample, which is divided into a plurality of aliquots diluted to aplurality of distinct final dilutions, and each of the plurality ofaliquots is immobilized onto a distinct section of the substrate. Inanother embodiment, one of the distinct final dilutions is determinedbased on a probable naturally-occurring concentration of at least one ofthe N distinct target analytes. In another embodiment, a concentrationof one of the N distinct target analytes is determined by counting theoccurrences of the target analyte within one of the distinct sectionsand adjusting the count according to the dilution of the respectivealiquot.

The invention includes a kit for detecting a plurality of analytes,comprising: a plurality of ordered probe reagent sets, each of theordered probe reagent sets comprising one or more probes directed to adefined subset of N distinct target analytes, wherein the N distincttarget analytes are immobilized on spatially separate regions of asubstrate, and each of the probes is detectably labeled. The kitincludes instructions for detecting said N distinct analytes based on aplurality of detectable signals. The kit include instructions forperforming at least M cycles of probe binding and signal detection, eachcycle comprising one or more passes, wherein a pass comprises use of atleast one of the ordered probe reagent sets. The kit includesinstructions for detecting from the at least M cycles a presence or anabsence of a plurality of signals from the spatially separate regions ofsaid substrate. The kit also includes instructions for determining fromthe plurality of signals at least K bits of information per cycle forone or more of said N distinct target analytes, wherein the at least Kbits of information are used to determine L total bits of information,wherein K×M=L bits of information and L>log 2 (N), and wherein said Lbits of information are used to determine a presence or an absence ofone or more of the N distinct target analytes.

In some embodiments, the kit includes one or more probes that comprisean antibody. In other embodiments, the label is a fluorescent label. Inanother embodiment, the probe is an antibody. In one embodiment, theantibody is conjugated directly to a label. In yet another embodiment,the antibody is bound to a secondary antibody conjugated to a label. Inother embodiments, the probe comprises an aptamer. The aptamer cancomprise a homopolymeric base region. In some embodiments, the pluralityof analytes comprises a protein, a peptide aptamer, or a nucleic acidmolecule.

In other embodiments, L>log 2 (N). In another embodiment, M≦N. The kitcan also include instructions for determining an identification of eachof the N distinct target analytes using the L bits of information,wherein L comprises bits of information for target identification.

The kit can include instructions for determining an order of saidplurality of ordered probe reagent sets using the L bits of information,wherein L comprises bits of information that are ordered in apredetermined order. The predetermined order can be a random order. Thekit can also include instructions for using a key for decoding an orderof the plurality of ordered probe reagent sets.

DETAILED DESCRIPTION

The figures and the following description relate to various embodimentsof the invention by way of illustration only. It should be noted thatfrom the following discussion, alternative embodiments of the structuresand methods disclosed herein will be readily recognized as viablealternatives that may be employed without departing from the principlesof what is claimed.

Reference will now be made in detail to several embodiments, examples ofwhich are illustrated in the accompanying figures. It is noted thatwherever practicable similar or like reference numbers may be used inthe figures and may indicate similar or like functionality. The figuresdepict embodiments of the disclosed system (or method) for purposes ofillustration only. One skilled in the art will readily recognize fromthe following description that alternative embodiments of the structuresand methods illustrated herein may be employed without departing fromthe principles described herein.

DEFINITIONS

A “target analyte” or “analyte” refers to a molecule, compound,substance or component that is to be identified, quantified, andotherwise characterized. A target analyte can comprise by way ofexample, but not limitation to, an atom, a compound, a molecule (of anymolecular size), a polypeptide, a protein (folded or unfolded), anoligonucleotide molecule (RNA, cDNA, or DNA), a fragment thereof, amodified molecule thereof, such as a modified nucleic acid, or acombination thereof. In an embodiment, a target analyte polypeptide orprotein is about nine amino acids in length. Generally, a target analytecan be at any of a wide range of concentrations (e.g., from the mg/mL toag/mL range), in any volume of solution (e.g., as low as the picoliterrange). For example, samples of blood, serum, formalin-fixed paraffinembedded (FFPE) tissue, saliva, or urine could contain various targetanalytes. The target analytes are recognized by probes, which are usedto identify and quantify the target analytes using electrical or opticaldetection methods.

Modifications to a target protein, for example, can includepost-translational modifications, such as attaching to a protein otherbiochemical functional groups (such as acetate, phosphate, variouslipids and carbohydrates), changing the chemical nature of an amino acid(e.g. citrullination), or making structural changes (e.g. formation ofdisulfide bridges). Examples of post-translational modifications alsoinclude, but are not limited to, addition of hydrophobic groups formembrane localization (e.g., myristoylation, palmitoylation), additionof cofactors for enhanced enzymatic activity (e.g., lipolyation),modifications of translation factors (e.g., diphthamide formation),addition of chemical groups (e.g., acylation, alkylation, amide bondformation, glycosylation, oxidation), sugar modifications (glycation),addition of other proteins or peptides (ubiquination), or changes to thechemical nature of amino acids (e.g., deamidation, carbamylation).

In other embodiments, target analytes are oligonucleotides that havebeen modified. Examples of DNA modifications include DNA methylation andhistone modification.

A “probe” as used herein refers to a molecule that is capable of bindingto other molecules (e.g., oligonucleotides comprising DNA or RNA,polypeptides or full-length proteins, etc.), cellular components orstructures (lipids, cell walls, etc.), or cells for detecting orassessing the properties of the molecules, cellular components orstructures, or cells. The probe comprises a structure or component thatbinds to the target analyte. In some embodiments, multiple probes mayrecognize different parts of the same target analyte. Examples of probesinclude, but are not limited to, an aptamer, an antibody, a polypeptide,an oligonucleotide (DNA, RNA), or any combination thereof. Antibodies,aptamers, oligonucleotide sequences and combinations thereof as probesare also described in detail below.

The probe can comprise a tag that is used to detect the presence of thetarget analyte. The tag can be directly or indirectly bound to,hybridized to, conjugated to, or covalently linked to the target analytebinding component. In some embodiments, the tag is a detectable label,such as a fluorescent molecule or a chemiluminescent molecule. In otherembodiments, the tag comprises an oligonucleotide sequence that has ahomopolymeric base region (e.g., a poly-A tail). The probe can bedetected electrically, optically, or chemically via the tag.

As used herein, the term “tag” refers to a molecule capable of detectinga target analyte). The tag can be an oligonucleotide sequence that has ahomopolymeric base region (e.g., a poly-A tail). In other embodiments,the tag is a label, such as a fluorescent label. The tag can comprise,but is not limited to, a fluorescent molecule, chemiluminescentmolecule, chromophore, enzyme, enzyme substrate, enzyme cofactor, enzymeinhibitor, dye, metal ion, metal sol, ligand (e.g., biotin, avidin,streptavidin or haptens), radioactive isotope, and the like. The tag canbe directly or indirectly bound to, hybridizes to, conjugated to, orcovalently linked to the probe.

A “protein” or “polypeptide” or “peptide” refers to a molecule of two ormore amino acids, amino acid analogs, or other peptidomimetics. Theprotein can be folded or unfolded (denatured). The polypeptide orpeptide can have a secondary structure, such as an α-helix, β sheet, orother conformation. As used herein, the term “amino acid” refers toeither natural and/or unnatural or synthetic amino acids, includingglycine and both the D or L optical isomers, and amino acid analogs andpeptidomimetics. A peptide can be two or more amino acids in length.Longer length peptides are often referred to as polypeptides. A proteincan refer to full-length proteins, analogs, and fragments thereof areencompassed by the definition. The terms also include post-expressionmodifications of the protein or polypeptide, for example, glycosylation,acetylation, phosphorylation and the like. Furthermore, as ionizableamino and carboxyl groups are present in the molecule, a particularpolypeptide may be obtained as an acidic or basic salt, or in neutralform. A protein or polypeptide may be obtained directly from the sourceorganism, or may be recombinantly or synthetically produced.

Proteins can be identified and characterized by a peptide sequence,side-chain modifications, and/or tertiary structure. Side-chainmodifications include phosphorylation, acetylation, sugars, etc.Phosphorylation of hydroxyl groups from serine, threonine and tyrosineamino acids are particularly important modifications of interest.

The term “in vivo” refers to processes that occur in a living organism.

The term “mammal” as used herein includes both humans and non-humans andinclude but is not limited to humans, non-human primates, canines,felines, murines, bovines, equines, and porcines.

“Sample” as used herein includes a specimen, culture, or collection froma biological material. Samples may be derived from or taken from amammal, including, but not limited to, humans, monkey, rat, or mice.Samples may be include materials such as, but not limited to, cultures,blood, tissue, formalin-fixed paraffin embedded (FFPE) tissue, saliva,hair, feces, urine, and the like. These examples are not to be construedas limiting the sample types applicable to the present invention.

A “bit” as used herein refers to a basic unit of information incomputing and digital communications. A bit can have only one of twovalues. The most common representations of these values are 0 and 1. Theterm bit is a contraction of binary digit. In one example, a system thatuses 4 bits of information can create 16 different values (as shown inTable 1A). All single digit hexadecimal numbers can be written with 4bits. Binary-coded decimal is a digital encoding method for numbersusing decimal notation, with each decimal digit represented by fourbits. In another example, a calculation using 8 bits, there are 2⁸ (or256) possible values.

TABLE 1A Example bit values Binary Octal Decimal Hexadecimal 0000 0 0 00001 1 1 1 0010 2 2 2 0011 3 3 3 0100 4 4 4 0101 5 5 5 0110 6 6 6 0111 77 7 1000 10 8 8 1001 11 9 9 1010 12 10 A 1011 13 11 B 1100 14 12 C 110115 13 D 1110 16 14 E 1111 17 15 F

A “pass” in a detection assay refers to a process where a plurality ofprobes are introduced to the bound analytes, selective binding occursbetween the probes and distinct target analytes, and a plurality ofsignals are detected from the probes. A pass includes introduction of aset of antibodies that bind specifically to a target analyte. There canbe multiple passes of different sets of probes before the substrate isstripped of all probes.

A “cycle” is defined by completion of one or more passes and strippingof the probes from the substrate. Subsequent cycles of one or morepasses per cycle can be performed. Multiple cycles can be performed on asingle substrate or sample. For proteins, multiple cycles will requirethat the probe removal (stripping) conditions either maintain proteinsfolded in their proper configuration, or that the probes used are chosento bind to peptide sequences so that the binding efficiency isindependent of the protein fold configuration.

It must be noted that, as used in the specification and the appendedclaims, the singular forms “a,” “an,” and “the” include plural referentsunless the context clearly dictates otherwise.

OVERVIEW

Detection techniques for highly multiplexed single moleculeidentification and quantification of analytes using both optical andelectrical systems are disclosed. Analytes can include, but are notlimited to, a protein, a peptide, DNA and RNA molecules, with andwithout modifications. Electrical detection is accomplished using ionsensitive field effect transistors (ISFET) integrated with MEMS(micro-electrical mechanical systems) structures for enhancedsensitivity. Techniques include poly-A tags with and withoutdifferential stops, complementary specific and non-specific probes fordetailed characterization of analytes, highly multiplexed singlemolecule identification and quantification using antibody probes.Optical detection is accomplished by detection of fluorescent orluminescent tags.

1. Computer System

FIG. 1 is a high-level block diagram illustrating an example of acomputer 100 for use in analyzing molecular analytes, in accordance withone embodiment. Illustrated are at least one processor 102 coupled to achipset 104. The chipset 104 includes a memory controller hub 120 and aninput/output (I/O) controller hub 122. A memory 106 and a graphicsadapter 112 are coupled to the memory controller hub 122, and a displaydevice 118 is coupled to the graphics adapter 112. A storage device 108,keyboard 110, pointing device 114, and network adapter 116 are coupledto the I/O controller hub 122. Other embodiments of the computer 100have different architectures. For example, the memory 106 is directlycoupled to the processor 102 in some embodiments.

The storage device 108 is a non-transitory computer-readable storagemedium such as a hard drive, compact disk read-only memory (CD-ROM),DVD, or a solid-state memory device. The memory 106 holds instructionsand data used by the processor 102. The pointing device 114 is used incombination with the keyboard 110 to input data into the computer system100. The graphics adapter 112 displays images and other information onthe display device 118. In some embodiments, the display device 118includes a touch screen capability for receiving user input andselections. The network adapter 116 couples the computer system 100 tothe network. Some embodiments of the computer 100 have different and/orother components than those shown in FIG. 1. For example, the server canbe formed of multiple blade servers and lack a display device, keyboard,and other components.

The computer 100 is adapted to execute computer program modules forproviding functionality described herein. As used herein, the term“module” refers to computer program instructions and other logic used toprovide the specified functionality. Thus, a module can be implementedin hardware, firmware, and/or software. In one embodiment, programmodules formed of executable computer program instructions are stored onthe storage device 108, loaded into the memory 106, and executed by theprocessor 102.

2. Compositions

Compositions are provided that bind and tag analytes, such as DNA, RNA,protein, and peptides, in a specific manner, such that individualmolecules can be detected and counted.

Antibodies as Probes

In some embodiments, the probe comprises antibodies that can be used asprobes to detect target analytes in a sample. As described below,antibodies are immunoglobulins that specifically bind to target proteinsor polypeptides. In a preferred embodiment, antibodies used in theinvention are monoclonal and can bind specifically to folded or unfoldedproteins.

“Antibody” refers to an immunoglobulin that specifically binds to, andis thereby defined as complementary with, another molecule. The antibodyis a glycoprotein produced by B-cells that is used by the immune systemto identify and neutralize foreign objects, such as bacteria andviruses. The antibody recognizes a unique part of the foreign target,called an antigen. Antibodies are typically made of basic structuralunits: two large heavy chains and two small light chains. The antibodycan be monoclonal or polyclonal, and can be naturally occurring,modified or recombinant. Antibodies can be prepared by techniques thatare well known in the art, such as immunization of a host and collectionof sera (polyclonal), or by preparing continuous hybrid cell lines andcollecting the secreted protein (monoclonal), or by cloning andexpressing nucleotide sequences or mutagenized versions thereof codingat least for the amino acid sequences required for specific binding ofnatural antibodies. Antibodies can include a complete immunoglobulin orfragment thereof, which immunoglobulins include the various classes andisotypes, such as IgA, IgD, IgE, IgG1, IgG2a, IgG2b and IgG3, IgM, etc.Fragments thereof may include Fab, Fv and F(ab′)2, Fab′, and the like.

A “monoclonal antibody” (mAB) is an immunoglobulin produced by a singleclone of lymphocytes, i.e. the progeny of a single B cell, whichrecognizes only a single epitope on an antigen. In addition, aggregates,polymers, and conjugates of immunoglobulins or their fragments can beused where appropriate so long as binding affinity for a particulartarget is maintained. An antibody (primary antibody) can be covalentlylinked to a detectable label (e.g., fluorescent label). In otherembodiments, a primary antibody binds to a secondary antibody that iscovalently linked to a detectable label. In some embodiments, theprimary antibody is conjugated to a labeled oligonucleotide molecule, asdescribed in U.S. Pat. No. 7,122,319 to Liu et al. filed on Nov. 5,2003, which is incorporated by reference in its entirety.

FIG. 2A illustrates an example of a probe comprising an antibody 132 anda detectable tag 134, and the probe binds a target analyte 130. In FIG.2B, an example is shown of a probe comprising a primary antibody 132 anda secondary antibody 210. The secondary antibody 210 is conjugated to adetectable label 134.

Aptamers

An “aptamer” as used herein refers to a nucleic acid molecule or apeptide molecule that binds to a target analyte. An aptamer can be acomponent of a probe. In some embodiments, nucleic acid aptamers arenucleic acid molecules that have been engineered through repeated roundsof in vitro selection or equivalently, SELEX (systematic evolution ofligands by exponential enrichment) to bind to various molecular targets,such as small molecules, proteins, nucleic acids, and even cells,tissues and organisms. See Tuerk C & Gold L (1990). Other methods ofaptamer generation include SAAB (selected and amplified binding site)and CASTing (cyclic amplification and selection of targets). Systematicevolution of ligands by exponential enrichment: RNA ligands tobacteriophage T4 DNA polymerase. Science. 249:505-510; M. Svobodová, A.Pinto, P. Nadal and C. K. O′ Sullivan. (2012) Comparison of differentmethods for generation of single-stranded DNA for SELEX processes. “AnalBioanal Chem” (2012) 404:835-842. Aptamers can bind to a unique n-mersequence found in a protein (e.g., denatured or folded protein) orpolypeptide. In one embodiment, the aptamer binds to a unique 9-mersequence. In some embodiments, aptamer can bind to a tag, such as anoligonucleotide strand comprising a homopolymeric base region (e.g., apoly-A tail).

In some embodiments, the probe comprises an aptamer and a tail region.An aptamer is an oligonucleotide or peptide molecule that binds to aspecific target analyte. FIG. 3 shows a target analyte 130 that is boundto an aptamer 300. The aptamer 300 includes a probe region 320, which isconfigured to specifically bind to the target analyte 130. The proberegion 320 can comprise a protein, peptide, or nucleic acid, and theprobe region 320 recognizes and binds to the target analyte. Each proberegion 320 can be coupled to a tag. In some embodiments, the tag is atail region 310. The tail region 310 is an oligonucleotide molecule ofat least 25 nucleotides and serves as a template for polynucleotidesynthesis. The tail region 310 is generally a single-stranded DNAmolecule, but could also be an RNA molecule. In one embodiment, the tailregion 310 is covalently linked to the probe region 330 through anucleic acid backbone.

In another embodiment, a portion of the tail region 310 specificallybinds to a linker region 330. The linker region 330 is covalently linkedto the probe region 320 through a nucleic acid backbone. The linkerregion 330 can be configured to specifically bind to a portion of onetail region 310, or portions of multiple tail regions 310. In anembodiment, the linker region 330 comprises at least 10 nucleotides. Inanother embodiment, the linker region 330 comprises 20-25 nucleotides. Aprobe region 320 can be covalently linked to a single linker region 330,or can be covalently linked to multiple distinct linker regions 330 thateach specifically binds to a portion of a distinct tail region 310.

The tail region 310 provides a template for polynucleotide synthesis.During polynucleotide synthesis, one hydrogen ion is released for eachnucleotide incorporated along the tail region template. A plurality ofthese hydrogen ions can be detected as an electrical output signal by atransistor. A minimum threshold number of hydrogen ions must be releasedfor the transistor to detect an electrical output signal. For example,the minimum threshold number could be 25 depending on details of thedetector configuration. In that case, the tail region 310 must be atleast 25 nucleotides long. In some embodiments, the tail region 310 isat least 25, 100, 200, 1000, or 10,000 nucleotides in length. The tailregion 310 can include one or more homopolymeric base regions. Forexample, the tail region 310 can be a poly-A, poly-C, poly-G, or apoly-T tail. In another embodiment, the tail region 310 comprises ahomopolymeric base region followed by a different homopolymeric baseregion, for example a poly-A tail followed by a poly-G tail. In oneembodiment, the tail region 310 is a DNA-based poly-A tail that is 100nucleotides in length. Nucleotides (dTTP's) are added under conditionsthat promote polynucleotide synthesis, and the nucleotides areincorporated to transcribe the tail region, thereby releasing hydrogenions. If the minimum threshold number of hydrogen ions for thetransistor to detect an electrical output signal is 100 nucleotides orless, a transistor will detect an electrical output signal. This signalis used to identify the target analyte associated with the poly-A tailregion and potentially determine the concentration of the target analytein the solution.

In some embodiments, the tail region 310 comprises a homopolymeric baseregion that includes one or more stop bases. FIG. 3 illustrates a singlestop base 330 that is flanked by two homopolymeric base regions. A stopbase 330 is a portion of a tail region 310 comprising at least onenucleotide adjacent to a homopolymeric base region, such that the atleast one nucleotide is composed of a base that is distinct from thebases within the homopolymeric base region. In one embodiment, the stopbase 330 is one nucleotide. In other embodiments, the stop base 330comprises a plurality of nucleotides. Generally, the stop base 330 isflanked by two homopolymeric base regions. In an embodiment, the twohomopolymeric base regions flanking a stop base 330 are composed of thesame base. In another embodiment, the two homopolymeric base regions arecomposed of two different bases. In another embodiment, the tail region310 contains more than one stop base 330.

Further details about aptamers and tail regions as probes fordifferential detection of small molecules is described in U.S.Provisional Application No. 61/868,988.

Molecular Tags

In some embodiments, the probe comprises a molecular tag for detectionof the target analyte. Tags can be attached chemically or covalently toother regions of the probe. In some embodiments, the tags arefluorescent molecules. Fluorescent molecules can be fluorescent proteinsor can be a reactive derivative of a fluorescent molecule known as afluorophore. FIG. 4 illustrates a fluorescent tag 402 attached to aprobe 320. Fluorophores are fluorescent chemical compounds that emitlight upon light excitation. In some embodiments, the fluorophoreselectively binds to a specific region or functional group on the targetmolecule and can be attached chemically or biologically. Examples offluorescent tags include, but are not limited to, green fluorescentprotein (GFP), yellow fluorescent protein (YFP), red fluorescent protein(RFP), cyan fluorescent protein (CFP), fluorescein, fluoresceinisothiocyanate (FITC), tetramethylrhodamine isothiocyanate (TRITC),cyanine (Cy3), phycoerythrin (R-PE) 5,6-carboxymethyl fluorescein,(5-carboxyfluorescein-N-hydroxysuccinimide ester), Texas red,nitrobenz-2-oxa-1,3-diazol-4-yl (NBD), coumarin, dansyl chloride, andrhodamine (5,6-tetramethyl rhodamine).

Other exemplary fluorescent tags are listed below in Table 1B.

TABLE 1B Fluorescent Tags mKeima- Katushka Hydroxycoumarin Cy7 Red(TurboFP635) Aminocoumarin DyLight 350 TagCFP mKate (TagFP635)Methoxycoumarin DyLight 405 AmCyan1 TurboFP635 Cascade Blue DyLight 488mTFP1 mPlum (Teal) Pacific Blue DyLight 549 S65A mRaspberry PacificOrange DyLight 594 Midoriishi- mNeptune Cyan Lucifer yellow DyLight 633Wild Type E2-Crimson GFP NBD DyLight 649 S65C MonochlorobimaneR-Phycoerythrin DyLight 680 TurboGFP Calcein (PE) PE-Cy5 DyLight 750TagGFP HyPer conjugates PE-Cy7 DyLight 800 TagGFP2 conjugates Red 613Hoechst 33342 AcGFP1 PerCP DAPI S65L TruRed Hoechst 33258 Emerald FluorXSYTOX Blue S65T Fluorescein Chromomycin EGFP A3 BODIPY-FL MithramycinAzami- Green TRITC YOYO-1 ZsGreen1 X-Rhodamine Ethidium Dronpa- BromideGreen Lissamine Acridine TagYFP Rhodamine B Orange Texas Red SYTOX EYFPGreen Allophycocyanin TOTO-1, Topaz (APC) TO-PRO-1 APC-Cy7 ThiazoleVenus conjugates Orange Alexa Fluor 350 Propidium mCitrine Iodide (PI)Alexa Fluor 405 LDS 751 YPet Alexa Fluor 430 7-AAD TurboYFP Alexa Fluor488 SYTOX PhiYFP Orange Alexa Fluor 500 TOTO-3, PhiYFP-m TO-PRO-3 AlexaFluor 514 DRAQ5 ZsYellow1 Alexa Fluor 532 Indo-1 mBanana Alexa Fluor 546Fluo-3 Kusabira- Orange Alexa Fluor 555 DCFH mOrange Alexa Fluor 568 DHRmOrange2 Alexa Fluor 594 SNARF mKO Alexa Fluor 610 Y66H TurboRFP AlexaFluor 633 Y66F tdTomato Alexa Fluor 647 EBFP DsRed- Express2 Alexa Fluor660 EBFP2 TagRFP Alexa Fluor 680 Azurite DsRed monomer Alexa Fluor 700GFPuv DsRed2 (“RFP”) Alexa Fluor 750 T-Sapphire mStrawberry Alexa Fluor790 TagBFP TurboFP602 Cy2 Cerulean AsRed2 Cy3 mCFP mRFP1

Probes Including Antibodies and Oligonucleotides

As shown in FIG. 5, the probe can comprise an antibody 132 linked to aregion 410 that can hybridize to an oligonucleotide tail region 310. Theoligonucleotide tail region 310 can be bound to the antibody 132 via alinking region 410, such as a polyethylene glycol (PEG) chain, ethyleneoxide subunits, or other similar chains that can link the antibody 132to the oligonucleotide tail 310. In some embodiments, the linking regioncan include an oligonucleotide that is linked to the antibody peptideusing standard chemical methods such as, e.g., NHS ester-maleimidemediated conjugation chemistry where N-terminal Cys incorporated peptidereacts with a maleimide active ted oligo. In other embodiments, linkingis accomplished through an internal Cys via oxime formation through ahydroxyl-amine modified peptide reaction with an aldehyde modifiedoligonucleotide. Such methods are known to the ordinarily skilledartisan. The oligonucleotide sequence in the linking region 410 canhybridize to a portion of the oligonucleotide tail region 310. Theoligonucleotide tail region 310 can comprise an oligonucleotide sequencethat is used as a template for polynucleotide synthesis and electricaldetection, as described above. U.S. Pat. No. 7,122,319, filed on Nov. 5,2003 to Liu et al. describes various embodiments for analyte bindingagents (e.g., antibodies) linked to oligonucleotide tags, and isincorporated by reference in its entirety.

As shown in FIG. 6, the probe comprises a primary antibody 132 and asecondary antibody 210. The primary antibody 132 binds the targetanalyte 130, and the secondary antibody 210 binds the primary antibody132. The secondary antibody 210 is conjugated to a linker region 410that hybridizes to an oligonucleotide tail region 310. The tail region310 acts as a detectable tag in electrical detection of the targetanalyte 130.

3. Methods

I. Substrate and Sample Preparation

The present invention provides methods for identifying and quantifying awide range of analytes, from a single analyte up to tens of thousands ofanalytes simultaneously over many orders of magnitude of dynamic range,while accounting for errors in the detection assay.

As shown in FIG. 7, a sample comprising analytes 610 (e.g., proteins,peptides, DNA and/or RNA) are bound to a solid substrate 600. Thesubstrate 600 can comprise a glass slide, silicon surface, solidmembrane, plate, or the like used as a surface for immobilizing theanalytes 610. In one embodiment, the substrate 600 comprises a coatingthat binds the analytes 610 to the surface. In another embodiment, thesubstrate 600 comprises capture antibodies or beads for binding theanalytes 610 to the surface. The analytes 610 can be bound randomly tothe substrate 600 and can be spatially separated on the substrate 600.The sample can be in aqueous solution and washed over the substrate,such that the analytes 610 bind to the substrate 600. In one embodiment,the proteins in the sample are denatured and/or digested using enzymesbefore binding to the substrate 600. In some embodiments, the analytes610 can be covalently attached to the substrate. In another embodiment,selected labeled probes are randomly bound to the solid substrate 600,and the analytes 610 are washed across the substrate.

FIG. 8 shows an example substrate 600 (10×10 array) for binding analytes610, where each array insert 700 has 11×11 (110) target arrays.

FIG. 9 is a top view of a solid substrate 600 with analytes randomlybound to the substrate 600. Different analytes are labeled as A, B, C,and D. For optical detection of the analytes, the imaging systemrequires that the analytes are spatially separated on the solidsubstrate 600, so that there is no overlap of fluorescent signals. For arandom array, this means that multiple pixels will be needed for eachfluorescent spot.

The number of pixels can be as few as 1 and as many as hundreds ofpixels per spot. It is expected that the optimal amount of pixels perfluorescent spot is between 5 and 20 pixels. In one example, an imagingsystem has 224 nm pixels. For a system with 10 pixels per fluorescentspot on average, there is a surface density of 2 fluorescent pixels/μm².This does not mean that the protein surface density needs to be thislow. If probes are only chosen for low abundance proteins, then theamount of protein on the surface may be much higher. For instance, ifthere are, on average, 20,000 proteins per μm² on the surface, andprobes are chosen only for the rarest 0.01% (as an integrated sum)proteins, then the fluorescent protein surface density will be 2fluorescent pixels/μm². In another embodiment, the imaging system has163 nm pixels. In another embodiment, the imaging system has 224 nmpixels. In a preferred embodiment, the imaging system has 325 nm pixels.In other embodiments, the imaging system has as large as 500 nm pixels.

II. Optical Detection Methods

Optical detection methods can be used to quantify and identify a largenumber of analytes simultaneously in a sample.

In one embodiment, optical detection of fluorescently-tagged singlemolecules can be achieved by frequency-modulated absorption andlaser-induced fluorescence. Fluorescence can be more sensitive becauseit is intrinsically amplified as each fluorophore emits thousands toperhaps a million photons before it is photobleached. Fluorescenceemission usually occurs in a four-step cycle: 1) electronic transitionfrom the ground-electronic state to an excited-electronic state, therate of which is a linear function of excitation power, b) internalrelaxation in the excited-electronic state, c) radiative ornon-radiative decay from the excited state to the ground state asdetermined by the excited state lifetime, and d) internal relaxation inthe ground state. Single molecule fluorescence measurements areconsidered digital in nature because the measurement relies on asignal/no signal readout independent of the intensity of the signal.

Optical detection requires an optical detection instrument or reader todetect the signal from the labeled probes. U.S. Pat. No. 8,428,454 andU.S. Pat. No. 8,175,452, which are incorporated by reference in theirentireties, describe exemplary imaging systems that can be used andmethods to improve the systems to achieve sub-pixel alignmenttolerances. In some embodiments, methods of aptamer-based microarraytechnology can be used. See Optimization of Aptamer MicroarrayTechnology for Multiple Protein Targets, Analytica Chimica Acta 564(2006).

A. Optical Detection of Multiple Analytes Using Tagged Antibodies

The method includes optical detection of analytes using taggedantibodies as probes. For a known target analyte (protein) in thesample, an antibody is selected that specifically binds to the targetanalyte. Selected antibodies can be those developed for ELISA andcomparable systems as single molecule probes. There are hundreds tothousands of existing and qualified primary and secondary antibodiesthat are readily available. In some embodiments, primary antibodies areselected that are conjugated to a tag, such as a fluorophore. In otherembodiments, primary antibodies are selected that bind to secondaryantibodies, and the secondary antibodies are conjugated to a fluorescentmolecule.

In one embodiment, the method includes selecting a primary antibody thathas a known, specific target protein in the sample. The primary antibodyis tagged with a detectable tag, such as a fluorescent molecule. Theselected primary antibodies are introduced and washed across thesubstrate. The primary antibodies bind to their target analytes, andsignals from the tags are detected.

In another embodiment, a primary antibody and a secondary antibodyconjugated to a detectable tag are selected. The selected primaryantibodies are introduced and washed across the substrate. The primaryantibodies bind to their target analytes. Next, secondary antibodies arewashed across the substrate and bind to the primary antibodies. The tagsproduce a detectable signal, and the signals are detected and analyzed,preferably using a computer, to determine whether a signal is detectedat a defined location, and in some embodiments additional informationabout the nature of the signal (e.g., the color of the label).

A pass comprises a binding step and a signal detection step. There canbe a number of passes per cycle, where each pass includes binding of aset of tagged antibodies to a different target protein and detection andanalysis of signals from the tagged antibodies. There can be multiplepasses of different tagged antibodies before the substrate is strippedof all tagged antibodies. A cycle concludes when one or more passes arecompleted, and the tagged antibodies are stripped from the substrate.Subsequent cycles of one or more passes per cycle can be performed withthe same substrate and sample of bound analytes.

An optical detection instrument or reader is used to optically detecteach of the signals from the labeled antibodies. The number of signals,location of the signal, and presence or absence of the signal can berecorded and stored. Details about the quantification and identificationof the analytes based on the detected optical signals are describedbelow.

1. Multiple Tans for Multiple Analytes

In one embodiment, a plurality of antibodies conjugated to fluorescenttags is used to detect individual proteins bound to a substrate. Eachdistinct type of protein is tagged with a limited number of fluorescenttags. For example, in a single pass, antibodies are introduced that aretagged with a red fluorescent tag and selectively bind to protein A. Thenumber of red fluors on the substrate is counted after binding. Thenumber of tags counted is proportional to the concentration of proteinA.

Each subsequent pass introduces a different fluorescent tag (differentcolor) for detecting a different protein (e.g., blue fluorescent tag forprotein B, yellow fluorescent tag for protein C, etc.). The presence ofeach fluorescent tag is counted at each pass and recorded. FIG. 9illustrates a solid substrate comprising analytes A, B, C, and D. Ateach pass, a different analyte can be detected with a differentfluorescent tag and counted accordingly.

In some embodiments, a “dark level” is used in the detection andanalysis of the analyte. A dark level exists where there is no tagpresent in the pass and no positive signal is counted, which is referredto as a “dark pass.” The absence of any signal is considered to be alevel (i.e., dark cycle counted). Incorporating a dark level allows thenumber of probes per cycle to be reduced by one. In some embodiments, itis preferred to have a positive signal and not use a dark level becausethe use of dark levels can be more susceptible to errors. One exampleembodiment is shown in FIG. 13. In other embodiments, where the rawsystem error rate is low and the number of probes per cycle is low, theuse of a dark level can significantly increase the amount of informationtransferred per cycle.

A specific case in which the use of a dark level is helpful is where aprimary antibody probe is hybridized to an analyte bound to a substrate,and in which a fluorescently or electrically tagged secondary antibodyis bound to the first antibody. The secondary antibody can bindnon-specifically to all antibodies so that only one level of informationis possible per cycle for a single pass system. In this case, the use ofa dark level (i.e., not including a primary antibody in the cycle) isrequired to achieve 1 bit of information per cycle.

To eliminate the use of a dark level when using secondary antibodies,either the use of two or more types of secondary antibodies which havehigh affinities to a predetermined set of probes of primary antibodiesand have low affinities to other predetermined sets of probes of primaryantibodies or at least two passes per cycle are required.

2. Single Tag for Multiple Analytes

In another embodiment, a plurality of antibodies conjugated tofluorescent tags is used to detect individual proteins bound to asubstrate. Each type of protein can be tagged with the same fluorescenttag (same color). For example, in one pass, antibody probes tagged witha red fluorescent tag selectively bind to protein A, and the number ofred fluors on the substrate is counted. On a second pass, antibodyprobes tagged with a red fluorescent molecule that specifically bind toprotein B are introduced, and the presence of the additional redfluorescent tags at additional locations on the substrate is counted andrecorded. Multiple passes can be performed using antibodies labeled withthe same fluorescent label that specifically bind different targetproteins. The presence of additional red fluorescent tags detected onthe substrate at each pass are counted and recorded. One exampleembodiment is shown in FIG. 14.

B. Methods for Optical Detection of Analytes

The high dynamic-range analyte quantification methods of the inventionallow the measurement of over 10,000 analytes from a biological sample.The method can quantify analytes with concentrations from about 1 ag/mLto about 50 mg/mL and produce a dynamic range of more than 10¹⁰. Theoptical signals are digitized, and analytes are identified based on acode (ID code) of digital signals for each analyte.

As described above, analytes are bound to a solid substrate, and probesare bound to the analytes. Each of the probes comprises tags andspecifically binds to a target analyte. In some embodiments, the tagsare fluorescent molecules that emit the same fluorescent color, and thesignals for additional fluors are detected at each subsequent pass.During a pass, a set of probes comprising tags are contacted with thesubstrate allowing them to bind to their targets. An image of thesubstrate is captured, and the detectable signals are analyzed from theimage obtained after each pass. The information about the presenceand/or absence of detectable signals is recorded for each detectedposition (e.g., target analyte) on the substrate.

In some embodiments, the invention comprises methods that include stepsfor detecting optical signals emitted from the probes comprising tags,counting the signals emitted during multiple passes and/or multiplecycles at various positions on the substrate, and analyzing the signalsas digital information using a K-bit based calculation to identify eachtarget analyte on the substrate. Error correction can be used to accountfor errors in the optically-detected signals, as described below.

In some embodiments, a substrate is bound with analytes comprising Ntarget analytes. To detect N target analytes, M cycles of probe bindingand signal detection are chosen. Each of the M cycles includes 1 or morepasses, and each pass includes N sets of probes, such that each set ofprobes specifically binds to one of the N target analytes. In certainembodiments, there are N sets of probes for the N target analytes.

In each cycle, there is a predetermined order for introducing the setsof probes for each pass. In some embodiments, the predetermined orderfor the sets of probes is a randomized order. In other embodiments, thepredetermined order for the sets of probes is a non-randomized order. Inone embodiment, the non-random order can be chosen by a computerprocessor. The predetermined order is represented in a key for eachtarget analyte. A key is generated that includes the order of the setsof probes, and the order of the probes is digitized in a code toidentify each of the target analytes.

In some embodiments, each set of ordered probes is associated with adistinct tag for detecting the target analyte, and the number ofdistinct tags is less than the number of N target analytes. In thatcase, each N target analyte is matched with a sequence of M tags for theM cycles. The ordered sequence of tags is associated with the targetanalyte as an identifying code.

In one example, there are 16 target proteins and 16 distinct probes foreach of the target proteins, but only four fluorescent tags (red, blue,green, and yellow). FIG. 10A illustrates an example of the 16 targetproteins (labeled P1, P2, P3, etc.) arranged on a substrate. The assaycan be set up with two cycles and one pass per cycle. Accordingly, twoordered sets of pools are created (one ordered set per cycle). Eachprobe pool uses the four tags to label the 16 target proteins in aunique 2-color sequence.

Table 2 below shows the 16 target analytes and corresponding probenumbers. Table 3 shows the four fluorescent tags (labeled 0 through 3).Tables 4 and 5 show two probe pools where each of the 16 target analytesare labeled with a first fluorescent tag in probe pool 1 and a secondfluorescent tag in probe pool 2. FIG. 10B illustrates the substrate ofFIG. 10A that has been contacted with probe pool 1. FIG. 10C illustratesthe substrate of FIG. 10A that has been contacted with probe pool 2. Forexample, Probe A2 is tagged with a blue fluorescent tag in probe pool 1,and a red fluorescent tag in probe pool 2. Accordingly, in cycle 1,probe A2 (bound to analyte P2) will emit a blue color, and in cycle 2,probe A2 will emit a red color. In another example, illustrated by FIG.10D, probe A7 has a green (GRN) tag in probe pool 1 (or cycle number 1).In probe pool 2, probe A7 has a blue (BLU) tag. In each probe pool,several probes share the same tag color, but the sequence of colorsacross the two pools is unique for each analyte. In probe pool #1, forexample, probes A4 and A8 are both tagged yellow. Only probe A9,however, is tagged red in probe pool #1 and green in probe pool #2.

TABLE 2 Analyte and Probe Analyte Number Probe Number P1  A1  P2  A2 P3  A3  P4  A4  P5  A5  P6  A6  P7  A7  P8  A8  P9  A9  P10 A10 P11 A11P12 A12 P13 A13 P14 A14 P15 A15 P16 A16

TABLE 3 Tag Number and Color Tag Number Tag Color 0 Red RED 1 Blue BLU 2Green GRN 3 Yellow YLW

TABLE 4 Probe Pool 1 Probe Pool #1 Probe Tag Number Color A1  RED A2 BLU A3  GRN A4  YLW A5  RED A6  BLU A7  GRN A8  YLW A9  RED A10 BLU A11GRN A12 YLW A13 RED A14 BLU A15 GRN A16 YLW

TABLE 5 Probe Pool 2 Probe Pool #2 Probe Tag Number Color A1  RED A2 RED A3  RED A4  RED A5  BLU A6  BLU A7  BLU A8  BLU A9  GRN A10 GRN A11GRN A12 GRN A13 YLW A14 YLW A15 YLW A16 YLW

Table 6 shows an example of a key comprising an ID (identification) codefor each target analyte based on color sequence. The table shows Nprotein targets by name, a corresponding base-10 number (1 to 10,000), abase-M number (e.g., base 4 with 7 digits shown here), and a colorsequence. The color sequence is the order and type of detected signal(red, blue, green, yellow) that was emitted for a particular analyte.The key provides a corresponding base-M number (e.g., base 4, 7 digits)and the identity of the target analyte that corresponds with each colorsequence. Accordingly, the base-4 calculation allows for an orderedcolor sequence of 7 signals, and identification of over 10,000 differenttarget analytes, each having its own identifying color sequence.

TABLE 6 Key of protein targets by name, base-10 number,base-M number and color sequence Target list Target list Target listTarget list by by base-10 by base-M by color name number number sequenceAlpha-1-acid 1 0000001 RRRRRRB Glycoprotein Apolipoprotein 2 0000002RRRRRRG B Myoglobin 3 0000003 RRRRRRY L-Selectin 6,751 1221133 BGGBBYYMMP9 9,999 2130033 GBYRRYY Troponin T 10,000 2130100 GBYRBRR

In one embodiment, the method includes the following steps for labelingprobe pools to count N different kinds of target analytes on a substrateusing fluorescently tagged probes of X different colors:

1. Number a list of the N targets (or their probes) using base-Xnumbers.

2. Associate fluorescent tags with base-X digits from 0 to X-1. (Forexample, 0, 1, 2, 3 correspond to red, blue, green, yellow.)

3. Find C such that X^(C)>N.

4. At least C probe pools are needed to identify the N targets. Labelthe C probe pools by an index k=1 to C.

5. In the k^(th) probe pool, label each probe with a fluorescent tag ofthe color that corresponds to the k^(th) base-X digit of the base-Xnumber that identifies the probe's target in the list created in Step 1.

For example, if one has N=10,000 target analytes and four fluorescenttags, a base 4 can be chosen. The 4 fluorescent tag colors designatedwith the numbers 0, 1, 2, and 3, respectively. For example, numbers 0,1, 2, 3 correspond to red, blue, green, and yellow.

When base 4 is chosen, each fluorescent color is represented by 2 bits(0 and 1, where 0=no signal and 1=signal present), and there are 7colors that are used as a code to identify a target analyte. Forexample, protein A may be identified with the code of “1221133” thatrepresents the color combination and order of “blue, green, green, blue,blue, yellow, yellow.” For the 7 possible colors, there are a total of14 bits of information for the target analyte (7×2=14 bits).

Next, C is chosen such that 4^(C)>10,000. In this case, C can be 7 suchthat there are 7 probe pools to identify 10,000 targets (4⁷=16,384,which is greater than 10,000). A color sequence of length C means that Cdifferent probe pools must be constructed. The 7 probe pools are labeledfrom k=1 to 7. Then each probe is labeled with a fluorescent tag thatcorresponds to the kth base and X-digit. For example, the third probe inthe code “1221133” will be the 3^(rd) base-4^(th) digit and correspondsto green.

C. Quantification of Optically-Detected Probes

After the detection process, the signals from each probe pool arecounted, and the presence or absence of a signal and the color of thesignal can be recorded for each position on the substrate.

From the detectable signals, K bits of information are obtained in eachof M cycles for the N distinct target analytes. The K bits ofinformation are used to determine L total bits of information, such thatK×M=L bits of information and L≧log₂ (N). The L bits of information areused to determine the identity (and presence) of N distinct targetanalytes. If only one cycle (M=1) is performed, then K×1=L. However,multiple cycles (M>1) can be performed to generate more total bits ofinformation L per analyte. Each subsequent cycle provides additionaloptical signal information that is used to identify the target analyte.

In practice, errors in the signals occur, and this confounds theaccuracy of the identification of target analytes. For instance, probesmay bind the wrong targets (e.g., false positives) or fail to bind thecorrect targets (e.g., false negatives). Methods are provided, asdescribed below, to account for errors in optical and electrical signaldetection.

III. Electrical Detection Methods

In other embodiments, electrical detection methods are used to detectthe presence of target analytes on a substrate. Target analytes aretagged with oligonucleotide tail regions and the oligonucleotide tagsare detected using ion-sensitive field-effect transistors (ISFET, or apH sensor), which measures hydrogen ion concentrations in solution.ISFETs are described in further detail in U.S. Pat. No. 7,948,015, filedon Dec. 14, 2007, to Rothberg et al., and U.S. Publication No.2010/0301398, filed on May 29, 2009, to Rothberg et al., which are bothincorporated by reference in their entireties.

ISFETs present a sensitive and specific electrical detection system forthe identification and characterization of analytes. In one embodiment,the electrical detection methods disclosed herein are carried out by acomputer (e.g., a processor). The ionic concentration of a solution canbe converted to a logarithmic electrical potential by an electrode of anISFET, and the electrical output signal can be detected and measured.

ISFETs have previously been used to facilitate DNA sequencing. Duringthe enzymatic conversion of single-stranded DNA into double-strandedDNA, hydrogen ions are released as each nucleotide is added to the DNAmolecule. An ISFET detects these released hydrogen ions and candetermine when a nucleotide has been added to the DNA molecule. Bysynchronizing the incorporation of the nucleoside triphosphates (dATP,dCTP, dGTP, and dTTP), the DNA sequence may also be determined. Forexample, if no electrical output signal is detected when thesingle-stranded DNA template is exposed to dATP's, but an electricaloutput signal is detected in the presence of dGTP's, the DNA sequence iscomposed of a complementary cytosine base at the position in question.

In one embodiment, an ISFET is used to detect a tail region of a probeand then identify corresponding target analyte. For example, a targetanalyte can be immobilized on a substrate, such as an integrated-circuitchip that contains one or more ISFETs. When the corresponding probe(e.g., aptamer and tail region) is added and specifically binds to thetarget analyte, nucleotides and enzymes (polymerase) are added fortranscription of the tail region. The ISFET detects the release hydrogenions as electrical output signals and measures the change in ionconcentration when the dNTP's are incorporated into the tail region. Theamount of hydrogen ions released corresponds to the lengths and stops ofthe tail region, and this information about the tail regions can be usedto differentiate among various tags.

The simplest type of tail region is one composed entirely of onehomopolymeric base region. In this case, there are four possible tailregions: a poly-A tail, a poly-C tail, a poly-G tail, and a poly-T tail.However, it is often desirable to have a great diversity in tailregions.

One method of generating diversity in tail regions is by providing stopbases within a homopolymeric base region of a tail region. A stop baseis a portion of a tail region comprising at least one nucleotideadjacent to a homopolymeric base region, such that the at least onenucleotide is composed of a base that is distinct from the bases withinthe homopolymeric base region. In one embodiment, the stop base is onenucleotide. In other embodiments, the stop base comprises a plurality ofnucleotides. Generally, the stop base is flanked by two homopolymericbase regions. In an embodiment, the two homopolymeric base regionsflanking a stop base are composed of the same base. In anotherembodiment, the two homopolymeric base regions are composed of twodifferent bases. In another embodiment, the tail region contains morethan one stop base.

In one example, an ISFET can detect a minimum threshold number of 100hydrogen ions. Target Analyte 1 is bound to a composition with a tailregion composed of a 100-nucleotide poly-A tail, followed by onecytosine base, followed by another 100-nucleotide poly-A tail, for atail region length total of 201 nucleotides. Target Analyte 2 is boundto a composition with a tail region composed of a 200-nucleotide poly-Atail. Upon the addition of dTTP's and under conditions conducive topolynucleotide synthesis, synthesis on the tail region associated withTarget Analyte 1 will release 100 hydrogen ions, which can bedistinguished from polynucleotide synthesis on the tail regionassociated with Target Analyte 2, which will release 200 hydrogen ions.The ISFET will detect a different electrical output signal for each tailregion. Furthermore, if dGTP's are added, followed by more dTTP's, thetail region associated with Target Analyte 1 will then release one, then100 more hydrogen ions due to further polynucleotide synthesis. Thedistinct electrical output signals generated from the addition ofspecific nucleoside triphosphates based on tail region compositionsallow the ISFET to detect hydrogen ions from each of the tail regions,and that information can be used to identify the tail regions and theircorresponding target analytes.

Various lengths of the homopolymeric base regions, stop bases, andcombinations thereof can be used to uniquely tag each analyte in asample. Additional description about electrical detection of aptamersand tail regions to identify target analytes in a substrate aredescribed in U.S. Provisional Application No. 61/868,988, which isincorporated by reference in its entirety.

In other embodiments, antibodies are used as probes in the electricaldetection method described above. The antibodies may be primary orsecondary antibodies that bind via a linker region to an oligonucleotidetail region that acts as tag. Examples of such probes are shown in FIGS.2, 5 and 6.

These electrical detection methods can be used for the simultaneousdetection of hundreds (or even thousands) of distinct target analytes.Each target analyte can be associated with a digital identifier, suchthat the number of distinct digital identifiers is proportional to thenumber of distinct target analytes in a sample. The identifier may berepresented by a number of bits of digital information and is encodedwithin an ordered tail region set. Each tail region in an ordered tailregion set is sequentially made to specifically bind a linker region ofa probe region that is specifically bound to the target analyte.Alternatively, if the tail regions are covalently bonded to theircorresponding probe regions, each tail region in an ordered tail regionset is sequentially made to specifically bind a target analyte.

In one embodiment, one cycle is represented by a binding and strippingof a tail region to a linker region, such that polynucleotide synthesisoccurs and releases hydrogen ions, which are detected as an electricaloutput signal. Thus, number of cycles for the identification of a targetanalyte is equal to the number of tail regions in an ordered tail regionset. The number of tail regions in an ordered tail region set isdependent on the number of target analytes to be identified, as well asthe total number of bits of information to be generated. In anotherembodiment, one cycle is represented by a tail region covalently bondedto a probe region specifically binding and being stripped from thetarget analyte.

The electrical output signal detected from each cycle is digitized intobits of information, so that after all cycles have been performed tobind each tail region to its corresponding linker region, the total bitsof obtained digital information can be used to identify and characterizethe target analyte in question. The total number of bits is dependent ona number of identification bits for identification of the targetanalyte, plus a number of bits for error correction. The number of bitsfor error correction is selected based on the desired robustness andaccuracy of the electrical output signal. Generally, the number of errorcorrection bits will be 2 or 3 times the number of identification bits.

IV. Decoding the Order and Identity of Detected Analytes

The probes used to detect the analytes are introduced to the substratein an ordered manner in each cycle. A key is generated that encodesinformation about the order of the probes for each target analyte. Thesignals detected for each analyte can be digitized into bits ofinformation. The order of the signals provides a code for identifyingeach analyte, which can be encoded in bits of information.

In one example for optical detection of analytes, using 1-bit ofinformation, each analyte is associated with an ordered set of probes.Table 7 below illustrates that each target analyte is associated with apredetermined order of a set of probes introduced over 7 cycles, and theorder of the signals emitted from the ordered set of probes is used as acode for identifying the target analyte. For example, for alpha-1-acidglycoprotein, the identifying code is an ordered set of probes of sixred (R) signals followed by a final blue (B) signal. When a set ofsignals is received for a target analyte that reads “RRRRRRB,” the keyis used to find a match between the identifying code of an order for theprobes and the obtained signals from the analyte. Accordingly, the codeis used to determine that the target analyte is alpha-1-acidglycoprotein.

TABLE 7 Key For Target Analytes Based On An OrderedSet Of Probes Over 7 Cycles and Corresponding Identifying Code TargetCycles Analyte 1 2 3 4 5 6 7 Signal Code Alpha-1-acid R R R R R R BRRRRRRB 0000001 Glycoprotein Apolipoprotein B R R R R R R G RRRRRRG0000002 Myoglobin R R R R R R Y RRRRRRY 0000003 L-Selectin B G G B B Y YBGGBBYY 1221133

In some embodiments, the user of a kit comprising the ordered set ofprobes and instructions for using the probe does not have access to thecode, such that he or she cannot match the ordered set of signals to thecorresponding target analyte. In one embodiment, the kit does notinclude the key for decoding the results, and the user sends the data toa third party for processing of the data using the code. In anotherembodiment, the key with ID codes is provided to a user of the kit, andthe user can decipher the ordered set of signals to the target analyte.

In a second example, each color (fluorescent signal) can be representedby a 2-bit sequence, and a 2-color sequence can be represented by a4-bit data symbol. Table 8 provides an example of four colors (red,blue, green and yellow) and their corresponding bit values. For example,a color sequence “BGGBBYY” for a particular analyte may be encoded in 14bits as 01101001011111 according to the bit scheme shown in Table 8.

TABLE 8 2-Bit Assignments for Fluorescent Labels Color Bits R 00 B 01 G10 Y 11

The order of the probes can be different for each analyte for each newcycle (when a cycle includes multiple passes) or for each set of cycles.The key used to identify an analyte in one set of cycles does not haveto be used again in a second assay. The codes for the target analytescan be altered for each assay.

In some embodiments, the predetermined order of the ordered set ofprobes is chosen randomly. In other embodiments, the predetermined orderis not random. In one embodiment, the computer software is used tospecify the order.

V. Error-Correction Methods

In optical and electrical detection methods described above, errors canoccur in binding and/or detection of signals. In some cases, the errorrate can be as high as one in five (e.g., one out of five fluorescentsignals is incorrect). This equates to one error in every five-cyclesequence. Actual error rates may not be as high as 20%, but error ratesof a few percent are possible. In general, the error rate depends onmany factors including the type of analytes in the sample and the typeof probes used. In an electrical detection method, for example, a tailregion may not properly bind to the corresponding probe region on anaptamer during a cycle. In an optical detection method, an antibodyprobe may not bind to its target or bind to the wrong target.

Additional cycles are generated to account for errors in the detectedsignals and to obtain additional bits of information, such as paritybits. The additional bits of information are used to correct errorsusing an error correcting code. In one embodiment, the error correctingcode is a Reed-Solomon code, which is a non-binary cyclic code used todetect and correct errors in a system. In other embodiments, variousother error correcting codes can be used. Other error correcting codesinclude, for example, block codes, convolution codes, Golay codes,Hamming codes, BCH codes, AN codes, Reed-Muller codes, Goppa codes,Hadamard codes, Walsh codes, Hagelbarger codes, polar codes, repetitioncodes, repeat-accumulate codes, erasure codes, online codes, groupcodes, expander codes, constant-weight codes, tornado codes, low-densityparity check codes, maximum distance codes, burst error codes, lubytransform codes, fountain codes, and raptor codes. See Error ControlCoding, 2^(nd) Ed., S. Lin and DJ Costello, Prentice Hall, New York,2004. Examples are also provided below that demonstrate the method forerror-correction by adding cycles and obtaining additional bits ofinformation.

One example of a Reed-Solomon code includes a RS (15,9) code with 4-bitsymbols, where n=15, k=9, s=4, and t=3, and n=2^(s)−1 and k=n−2t, “n”being the number of symbols, “k” being the number of data symbols, “s”being the size of each symbol in bits, and “t” being the number oferrors that can be corrected, and “2t” being the number of paritysymbols. There are nine data symbols (k=9) and six parity symbols(2t=6). If base-X numbers are used, and X=4, then each fluorescent coloris represented by two bits (0 and 1). A pair of colors may berepresented by a four-bit symbol that includes two high bits and two lowbits.

FIG. 11 illustrates the RS (15,9) example structure. Since base-4 waschosen, seven probe pools, or a sequence of seven colors, are used toidentify each target analyte. This sequence is represented by 3½, 4-bitsymbols. The remaining 5½ data symbols are set to zero. A Reed-SolomonRS (15,9) encoder then generates the six parity symbols, represented by12 additional probe pools. Thus, a total of 19 probe pools (7+12) arerequired to obtain error correction for t=3 symbols.

Monte Carlo simulations of error-correcting code performance have beenperformed assuming seven probe pools, to identify up to 16,384 distincttargets. Using these simulations, the maximum permissible raw error rate(associated with identifying a fluorescent label) to achieve a correctederror rate of 10⁻⁵ was determined for different numbers of parity bits.Table 10A below illustrates these findings.

TABLE 10A Error rates Reed-Solomon parity Maximum permissible raw errorrate to symbol number ensure a correct error rate <10⁻⁵ 6  2% 8  5% 1010%

In some embodiments, a key is generated that includes the expected bitsof information associated with an analyte (e.g., the expected order ofprobes and types of signals for the analyte). These expected bits ofinformation for a particular analyte are compared with the actual L bitsof information that are obtained from the target analyte. Using theReed-Solomon approach, an allowance of up to t errors in the signals canbe tolerated in the comparison of the expected bits of information andthe actual L bits of information.

In some embodiments, a Reed-Solomon decoder is used to compare theexpected signal sequence with an observed signal sequence from aparticular probe. For example, seven probe pools may be used to identifya target analyte, the expected color sequence being BGGBBYY, representedby 14 bits. Additional parity pools may then be used for errorcorrection. For example, six 4-bit parity symbols may be used. Then, asshown below in Table 10B, the expected signal sequence is comparedagainst the observed signal sequence, and a decoded signal sequence isgenerated from the comparison.

TABLE 10B EXPECTED SIGNAL BGGBBYYYYRYBBYGBYGR SEQUENCE OBSERVED SIGNALBGG R BYYYY G YBBYGBYGR SEQUENCE DECODED SIGNAL BGGBBYYYYRYBBYGBYGRSEQUENCE

The observed signal sequence has 2 errors in an ordered sequence of 19signals. When the received probe sequence is decoded by a Reed-Solomondecoder, the original, transmitted probe sequence is recovered. Theexpected signal sequence is the sequence that is designed to identifyone type of analyte. The observed signal sequence is the sequence offluorescent signals received at a particular location on a solidsubstrate. The decoded sequence is the recovered sequence after decodingby an error correcting code decoder.

In another embodiment, using electrical detection of analytes, theprobes and selected bits of information used in the electrical detectionmethod follow error correction calculations as shown in Table 11 below.In Example 1, 3 bits of ID are chosen, which corresponds to a total of 8target analytes and 8 ID numbers (2³=8). In addition, the error factoris calculated to be the number of bits of error divided by the number ofbits of ID. Here, the number of bits used for error correction in thisexample is 9 (3×3=9), and the error factor would be 3 (9/3=3). The totalof bits per run is 12 (sum of 3 bits of ID and 9 bits of errorcorrection). The number of bits per cycle can be chosen as 3 and thenumber of probes per cycle is determined to be 8 (2³=8). Next, thenumber of cycles is calculated to be 4 based on the number of bits,error factor, and bits per cycle. The equation used is ((bits×(1+errorfactor)/bits per cycle). Here, the calculation is (3×(1+3))/3)=4 cycles.In this example, one stop is used per electrical tag. The number ofdetectable probes can be increased based on selection of higher bits, asshown in examples 2-5 in Table 11.

TABLE 11 Summary of Example Assays Using Various Number of Bits, Numberof Targets, Number of Probes, Cycles and Stop Types Example # 1 2 3 4 5Equation # Bits ID 3 4 8 12 16 bits # ID's (# of Simultaneous Targets) 816 256 4,096 65,536 2^(bits) Error Factor (# Bits Error/# Bits ID) 3 3 33 3 err # Bits Error Correction 9 12 24 36 48 err * bits Total # of Bitsper Run 12 16 32 48 64 bits * (1 + err) # Bits per Cycle 3 4 4 6 8 bpc #Probes per Cycle 8 16 16 64 256 2^(bpc) # Cycles 4 4 8 8 8 (bits * (1 +err))/bpc # Stops 1 1 2 2 2 stp # Stop Types 1 3 3 2 3 typ # FlowSequences per Cycle 3 7 13 9 13 1 + 2 * stp * typ # Levels 9 7 4 8 10Max # of Probes 8 18 27 84 324

Additional description about electrical detection methods are found inU.S. Provisional Application No. 61/868,988, which is incorporated byreference in its entirety.

VI. Dynamic Range

The concentrations of analytes such as proteins in samples such as humanserum can vary by factors of greater than 10¹⁰. The dynamic ranges oflikely concentrations for particular proteins are generally smaller. Forexample, Ferritin is normally found between 10⁴-10⁵ pg/mL in humanserum. Most protein concentrations do not vary by more than a factor of10³ from one human serum sample to another.

Because it is difficult to detect fluorescent labels corresponding totarget analytes at a large dynamic range of concentrations, a substratecontaining target analytes can be divided into concentration regions.For example, FIG. 12A shows an example substrate that has been dividedinto regions “HIGH,” “MED,” and “LOW.” The same target analytes may bedistributed throughout each of the three regions. However, the targetanalytes are diluted to different concentration samples beforedistribution: one sample each of high concentration (HIGH), mediumconcentration (MED), and low concentration (LOW). In one embodiment, thetarget analytes in the “HIGH” region of the substrate are distributed ata concentration of around 1 protein per square micron, the targetanalytes in the “MED” region are distributed at around 10² proteins persquare micron, and the target analytes in the “LOW” region aredistributed at around 10⁴ proteins per square micron. In a furtherembodiment, the dilutions are adjusted such that the density offluorescent labels in each concentration region is around one tag per10-25 pixels in an image of the substrate.

FIG. 12B is a graph showing an example of abundance ranges of targetanalytes from a sample, located in different concentration regions (Low,Med, and High) of a substrate. FIG. 12C is also a graph showingabundance ranges of target analytes, with a fourth concentration region:“Rare.” Overlapping abundance ranges demonstrate that certain targetanalytes may be detected in more than one concentration region. Thesegraphs (FIGS. 12B and 12C) were generated from simulations performed inExample 5 (below).

In one embodiment, particular target analytes within a sample may beseparated from the sample to increase the dynamic range even further.For example, in a sample of human serum, it may be desirable to removealbumin, a highly abundant protein. Any separation technique may beused, including high-performance liquid chromatography.

Once the different dilutions of target analyte samples have beenattached to the substrate, probes may be applied to selectively bind tothe target analytes. In an embodiment, the probes may be prepared atvarying concentrations so that they selectively bind to the targetanalytes of medium abundance in the “MED” region of the substrate.

EXAMPLES

Below are examples of specific embodiments for carrying out the presentinvention. The examples are offered for illustrative purposes only, andare not intended to limit the scope of the present invention in any way.Efforts have been made to ensure accuracy with respect to numbers used(e.g., amounts, temperatures, etc.), but some experimental error anddeviation should, of course, be allowed for.

The practice of the present invention will employ, unless otherwiseindicated, conventional methods of protein chemistry, biochemistry,recombinant DNA techniques and pharmacology, within the skill of theart. Such techniques are explained fully in the literature. See, e.g.,T. E. Creighton, Proteins: Structures and Molecular Properties (W. H.Freeman and Company, 1993); A. L. Lehninger, Biochemistry (WorthPublishers, Inc., current addition); Sambrook, et al., MolecularCloning: A Laboratory Manual (2nd Edition, 1989); Methods In Enzymology(S. Colowick and N. Kaplan eds., Academic Press, Inc.); Remington'sPharmaceutical Sciences, 18th Edition (Easton, Pa.: Mack PublishingCompany, 1990); Carey and Sundberg Advanced Organic Chemistry 3^(rd) Ed.(Plenum Press) Vols A and B (1992).

Example 1 Optical Detection Assay for Multiple Analytes Using a SingleFluorescent Tag, Single Pass, Dark Counted, and 1 Bit Per Cycle

In one example, the method is performed using the following parameters:Single Fluorescent Tag (Single Color), Single Pass, Dark Counted, and 1Bit per Cycle. FIG. 9 illustrates this example where the fluorescentsurface density is lower than the deposited protein surface density. TheFigure shows four different types of target proteins mixed with othernon-target proteins shown randomly deposited on the surface.

Table 12 below shows how a total of four bits of information can beobtained using four cycles of hybridization and stripping, such thatthere is one pass per cycle. The signals obtained from the four cyclesare digitized into bits of information.

As illustrated in FIG. 13, probes are introduced for Analyte A in Cycle1, and the presence of the analyte is indicated by a “1.” In Cycle 2,the probes for Analyte A are not added, and the analyte is not detected,indicated by a “0.” After four cycles, Analyte A can be associated witha binary code of “1010.” In a base-2 system, the code is represented as“1010”. In a base-10 system, the same code is represented as “10” (e.g.,(2³×1)+(2²×0)+(2¹×1)+(2⁰×0)=10).

In the first cycle, only antibody probes for targets A and B areincluded in the probe pool. The imaging system measure a single colorimage for the first cycle, where A and B molecules fluoresce, but C andD are dark (no probes and no signal). The probes for targets A and B arestripped. For the second cycle, antibody probes for targets C and D areintroduced and are imaged and then the antibody probes for C and D arestripped. For the third cycle, antibody probes targets A and C areintroduced and imaged. The antibody probes for targets A and C are thenstripped. For the fourth cycle, antibody probes for targets B and D areintroduced, and the fluorescent molecules are imaged. After imagingmultiple cycles, the ID (code of fluorescent signals) for the targetmolecule at each position is determined. Only 2 cycles are necessary foridentification of 4 molecules. In some embodiments, additional cyclescan be used for error correction information, which is described below,or to identify more than 4 molecules.

TABLE 12 Single Fluorescent Tag, Single Pass, Dark Counted, and 1 BitPer Cycle ID - Cycle 1 Cycle 2 Cycle 3 Cycle 4 ID - Binary DecimalAnalyte A 1 0 1 0 1010 10 Analyte B 1 0 0 1 1001 9 Analyte C 0 1 1 00110 6 Analyte D 0 1 0 1 0101 5

Example 2 Optical Detection Assay for Multiple Analytes Using SingleColor, Four Passes Per Cycle, Dark Pass not Counted, and 2 Bits PerCycle

In another example, the following parameters are used: Single Color,Four Passes per Cycle, Dark Pass Not Counted, and 2 bits per Cycle.

In FIG. 14, four passes are shown for a single cycle. The first passincludes probes for target analyte A. The probes for target A hybridize,and the detectable signals are imaged. For example, the probes comprisea green fluorescent molecule and emit a green color. The probes fortarget A are not stripped from the substrate in this example. In pass 2,the probes for target B are hybridized, and the probes for target B havethe same fluorescent color as the probes for target A. The additionalsignals for target B (green fluors) are detected, and both of thesignals for targets A and B are imaged. The probes for A and B are notstripped from the substrate.

In pass 3, probes for target C are introduced and hybridize to target C.Probes for target C emit the same fluorescent color as targets A and B.The signals emitted from the probes for targets A, B and C are imaged.In pass 4, probes for target D are hybridized, and the signals emittedfrom targets A, B, C and D are imaged. Finally, all probes are stripped,and the first cycle is completed.

Multiple cycles can be performed to increase the number of targets to bequantified. It is not necessary to have probes for every target in everypass, and there may be many more than four molecules observed.

Table 13 below shows how signals obtained from one cycle with fourpasses are digitized and represented as two bits of information percycle. Over the period of four cycles, a total of 8 bits of informationper analyte can be obtained. Table 14 provides the key for the digitaloutput for four passes in a cycle.

TABLE 13 Single Color, Four Pass, Dark Not Counted, 2 Bits per Cycle

TABLE 14 Key for Example 2 Assay

It is possible to use multiple fluors on the secondary analytes insteadof performing successive hybridizations to achieve more bits ofinformation per cycle. For instance, four colors of fluorescent tags onsecondary antibodies would allow for one hybridization step and onestripping step per cycle to achieve two bits of information per cycle.

A combination of using multiple colors and also performing multiplehybridization steps per cycle could increase the number of bitsmeasurable per cycle. For instance, using a four-color imaging systemand performing 4 steps of hybridization per cycle would allow up to fourbits of information per cycle to be achieved.

Example 3 Optical Detection Assay for Multiple Analytes Using FiveColors, Three Passes Per Cycle, Dark Pass Counted, Four Bits Per Cycle

In another example, the following parameters are used: Five Colors,Three Passes per Cycle, Dark Pass Counted, Four bits per Cycle.

The following tables illustrate an assay with a five color system with 3passes of hybridization per cycle. A total of 16 levels or equivalentlyfour bits of information is possible per cycle if the absence of anysignal is considered to be a level (i.e., dark cycle counted). Table 16provides a key for the ID code for each analyte.

TABLE 15

TABLE 16

Table 17 below shows the number of bits per cycle for a multi-color,multi-pass hybridization for optical detection, with and without theabsence of signal considered to be a level (dark cycle counted/darkcycle not counted).

TABLE 17 Number of Bits per Cycle for Multi-Color, Multi-PassHybridization for Optical Detection # Colors # Passes # of Levels # BitsPer # of Levels # Bits Per Per (Hybridizations) Per Cycle Cycle PerCycle Cycle Cycle Per Cycle Dark Not Counted Dark Counted 1 1 1 0.00 21.00 1 2 2 1.00 3 1.58 1 3 3 1.58 4 2.00 1 4 4 2.00 5 2.32 1 5 5 2.32 62.58 2 1 2 1.00 3 1.58 2 2 4 2.00 5 2.32 2 3 6 2.58 7 2.81 2 4 8 3.00 93.17 2 5 10 3.32 11 3.46 3 1 3 1.58 4 2.00 3 2 6 2.58 7 2.81 3 3 9 3.1710 3.32 3 4 12 3.58 13 3.70 3 5 15 3.91 16 4.00 4 1 4 2.00 5 2.32 4 2 83.00 9 3.17 4 3 12 3.58 13 3.70 4 4 16 4.00 17 4.09 4 5 20 4.32 21 4.395 1 5 2.32 6 2.58 5 2 10 3.32 11 3.46 5 3 15 3.91 16 4.00 5 4 20 4.32 214.39 5 5 25 4.64 26 4.70

Example 4 Demonstrating DNA Probe and Target Hybridization and Strippingat a Bulk Level

Nucleic acids were used to demonstrate APTIQ probe/target hybridizationand stripping cycles at a bulk level. Oligonucleotides (Table 18) werepurchased from IDT Integrated DNA Technologies (Coralville, Iowa).Oligos were dissolved in molecular grade water at a final concentrationof 100 μM and were stored at −20° C.

TABLE 18 Oligo and Probe Sequences Sequence Name Sequence Oligo B1/5AmMC6/d(A)₄₀ GCA CCC TTG GTC TCC TCC A Oligo B2/5AmMC6/d(A)₄₀CT CAG CAG CAT CTC AGG GCC A Oligo B3/5AmMC6/d(A)₄₀GCT GCA TGC ACG CAC ACA CA Probe Name Sequence Cy3-anti-B1/5Cy3/TGG AGG AGA CCA AGG GTG CAG T Cy3-anti-B2/5Cy3/TGG CCC TGA GAT GCT GCT GAG T Cy3-anti-B3/5Cy3/TGT GTG TGC GTG CAT GCA GC Cy5-anti-B1/5Cy5/TGG AGG AGA CCA AGG GTG CAG T Cy5-anti-B2/5Cy5/TGG CCC TGA GAT GCT GCT GAG T Cy5-anti-B3/5Cy5/TGT GTG TGC GTG CAT GCA GC

Oligos with C6-amino linkers were printed on microarrays at ArrayIt(Sunnyvale, Calif.). Unless otherwise specified, all reagents andequipment used in these Examples were purchased from ArrayIt. Oligoswere printed at 50 μM final concentration in 1×MSP buffer (Cat ID: MSP)on SuperEpoxy 2 Microarray Substrates (Cat ID: SME2), on a NanoPrintMicroarrayer using SMP3 Microarray Printing Pin. Printed microarrayswere dried overnight.

Prior to use, a substrate slide was blocked for 1.5 hours in BlockitBlocking Buffer (Cat ID: BKT) at room temperature with gentle agitationat 350 rpm, followed by washing 3 times, 1 minute at a time, with WashBuffers 1, 2, 3 at 1×(Cat ID: WB1, WB2, WB3) in a square petri dish, 30ml volume at 350 RMP 2 mm orbit. The slide was then spin dried for 10seconds with a Microarray Centrifuge (Cat ID: MHC110).

A Gasket (Cat ID: GAHC4×24) was blocked in Blockit blocking buffer forat least 1 hour, rinsed using distilled water, dried using MicroarrayCleanroom Wipe (Cat ID: MCW), and loaded into the lid of the cassette(Cat ID: AHC4×24). Hybit hybridization buffer (Cat ID: HHS2) was usedfor hybridization at 1×. Cy3 or Cy5 labeled probes (Table 18)(corresponding to color R—red, or G—green, respectively) were mixed inthe probe pools in 1×hybridization buffer. 75 μl of hybridization probepools were loaded on the microarray and incubated for 15 minutes at 37C.°, RMP 350 on Arrayit Array Plate Hyb Station (Cat ID: MMHS110V).

100 ul of wash buffer (at 37° C.) 1 was added to each well and thenincubated for 1 minute at RMP 350. The wash buffer was then taken out byexpelling the wash buffer into the waste. Wash buffer 1 was used twomore times, wash buffer 2 was used three times and then wash buffer 3was used three times.

The slide was removed from the gasket submerged in wash buffer 3 in acontainer, and spin dried in the Microarray Centrifuge. The slide wasscanned in an Axon scanner 4200A with setting of PMT250 for both 532 and635 Lasers. The slide was incubated with spots side up in a square petridish containing 30 ml of Stripping Buffer A at 350 RMP for 10 minutes.Stripping Buffer A was removed and immediately followed by addition of30 ml of Stripping Buffer B. The procedure was repeated with StrippingBuffer B and Stripping Buffer C. The slide was dried in the microarraycentrifuge and prepared for the next cycle of hybridization. The slidewas scanned after stripping to confirm the efficiency of the stripping.

FIG. 15 illustrates the scanning results: three oligo targets, (B1, B2,and B3) were identified by probing binding and stripping for 6 cycles.The color sequence of each oligo target was correctly identified.

Example 5 Using DNA to Demonstrate Single Molecule Counting

We describe a method for the identification and quantification of singlemolecules. Oligonucleotides (Table 19) were purchased from IDTIntegrated DNA Technologies. Oligos were dissolved in molecular gradewater at final concentration 100 uM and were stored at −20° C.

TABLE 19 Oligo and Probe Sequences Sequence Name Sequence C6-P53/5AmMC6/AAA AAA ACT GCA CCC TTG GTC TCC TCC A C6-BRAF/5AmMC6/AAA AAA ACT CAG CAG CAT CTC AGG GCC A C6-EGFR/5AmMC6/AAA AAA GCT GCA TGC ACG CAC ACA CA C6-KRAS/5AmMC6/AAA AAA ATC CCA GCA CCA CCA CTA CCG A Probe Name SequenceCy3-anti-P53 /5Cy3/TGG AGG AGA CCA AGG GTG CAG T Cy3-anti-BRAF/5Cy3/TGG CCC TGA GAT GCT GCT GAG T Cy3-anti-EGFR/5Cy3/TGT GTG TGC GTG CAT GCA GC Cy3-anti-KRAS/5Cy3/TCG GTA GTG GTG GTG CTG GGA T Cy5-anti-P53/5Cy5/TGG AGG AGA CCA AGG GTG CAG T Cy5-anti-BRAF/5Cy5/TGG CCC TGA GAT GCT GCT GAG T Cy5-anti-EGFR/5Cy5/TGT GTG TGC GTG CAT GCA GC Cy5-anti-KRAS/5Cy5/TCG GTA GTG GTG GTG CTG GGA T

Silicon slides were purchased from University Wafer (Boston, Mass.),diced (American Precision Dicing Inc., San Jose, Calif.), and coatedwith SuperEpoxy substrate (ArrayIt). The single crystal silicon chipswere prepared as 25 mm×75 mm substrate slides. The thickness of thesilicon chips used were 500 μm, 675 μm, and 1000 μm. A thermal oxide wasgrown on the silicon chips of 100 nm and then diced into slides.

A slide was incubated in a solution of 4 DNA oligos (Table 19), eacholigo ending in a C6 molecule. The sequences of the 4 oligos were A, B,C & D corresponding to the genes encoding for KRAS, EGFR, BRAF and P53.The 4 oligos with C6-amino linker were mixed at 100 nM per oligo in 1×micro spotting solution (Cat ID: MSS, ArrayIt) and then incubated on theepoxy coated silicon slide in a container at room temperature overnight.During incubation, a reaction between the epoxy coating and the C6oligos covalently bonded the single stranded DNA to the surface. Theslide was then washed with molecule grade water for 5 minutes, 3 times,followed by incubation in ArrayIt BlockIt blocking solution for 1 hourat room temperature with gentle agitation at 350 rpm, followed bywashing 3 times for 1 minute each time with Wash Buffers 1, 2, 3 at 1×in a square petri dish, 30 ml volume at 350 RMP 2 mm orbit. The slidewas spin dried for 10 seconds with the Microarray Centrifuge.

The chip was fabricated with glue into a biochip consisting of 3 parts,silicon chip, peek frame, and a 170 μm-thick coverslip glass. Thecoverslip (Nexterion, Tempe, Ariz.) was glued on the silicon slide withBostik glue mixed with 50 uM beads (Gelest, Morrisville, Pa.) on anin-house developed device. The glue and beads was packed in 3 cc syringe(Hamilton Company, Reno, Nev.) and centrifuged in EFD ProcessMatecentrifuge (Nordson, Westlake, Ohio) and then delivered by Nordson EFDUltimus I glue dispenser.

Cy3 or Cy5 labeled probes (Table 19) were mixed in the probe pools in1×HybIt hybridization buffer. Hybridization solution from pool #1 wasdelivered in the biochip and incubated for 15 minutes at roomtemperature. The chip was then washed with washing buffer 1, 2 and 3(ArrayIt), 8 times with each buffer. 15% glycerol in 1×SSPE (150 mMNaCl, 10mMNaH2PO4, ImMEDTA) was added to the chip before imaging.Successive probe pools of probes 1, 2, 3, 4 were hybridized andstripped. After each hybridization step, an imaging system imaged 12regions of the slide, each region being 100 μm×100 μm. The camera usedwas a Hamamatsu Orca 4.0 with a 40× magnification system using Olympuspart # UAPON40XW.

After imaging, the chip was rinsed with molecular grade water and thenstripped in the stripping buffer A, B, C (ArrayIt), 8 times each buffer.15% glycerol in 1×SSPE was added to the chip before imaging. After cycle1 which includes hybridization probe pool #1, imaging, stripping,imaging, the cycle 2 starts with hybridization with probe pool #2.

Data was taken on two slides (slides #177 and #179, FIG. 16A), eachslide containing twelve fields. Each field was 100 μm×100 μm. Atwo-color imaging system was used with CY3 and CY5 filters. For eachslide, at least 12-14 cycles of data were collected, with the analysisusing 9 to 10 cycles of data. The mapping of target identification ID tocolor sequence is illustrated in Table 19. In FIG. 16A, each color mapsto a sequence such that a probe is labeled with CY5 or CY3,corresponding respectively to color R (red) or G (green), which maps to1 or 0 with 1 bit of information being acquired per cycle (FIG. 16B).For this case, the error correction scheme was conservative and requiredzero errors per target, an error being defined as a positiveidentification in a sequence where it was not expected. Up to fivemissing sequences were allowed per molecule. Missing sequences are caseswhere a molecule is not identified in a cycle. These are not classifiedas errors.

Slides #177 & #179 were measured under similar conditions. A smallportion of each slide was measured (measuring the entire slide is animplementation of scale and automation). FIG. 16A shows the number ofmolecules in each field with the number of each gene that wasidentified. The percentages of identified molecules were 12%-13%.

FIG. 17 illustrates an image taken with the prototype imager of singlefluor DNA probes that have been hybridized to the DNA target oligoscovalently bonded to the surface. Between 10%-15% of the identifiedmolecules had multiple fluors per spot due to aggregation that occurredduring sample attachment.

FIG. 18 illustrates representative examples of identification of each ofthe four targets from slide #177 (FIG. 16A). Each spot in the circle isaligned to center on the target. The targets are identified with singlefluor detection. Approximately 10%-15% of the targets on the image wereclustered molecular species where more than a single fluor was bound.Inspecting the data shows that for both experiments, greater levels ofP53 and KRAS were observed than BRAF and EGFR. The total number ofmolecules identified was under 2000 in both cases, demonstrating thepotential high sensitivity of the method in detecting and identifyinglow numbers of molecules.

Example 6 Demonstrating Peptide Probe and Target Hybridization andStripping at a Bulk Level

Peptides were used to demonstrate APTIQ probe/target binding andstripping cycles at bulk level. Peptide MUC1 (Sequence: APDTRPAPG) waspurchased from American Peptide (Sunnyvale, Calif.). MUC1 was dissolvedat 1 mg/ml in 1× peptide printing buffer 1 (Cat ID: PEP, ArrayIt).Peptide MUC16 at 0.2 mg/ml, monoclonal antibodies against mouseanti-MUC1 C595 [Cat ID: NCRC48], and rabbit anti-MUC16 [Cat ID:EPSISR23] were purchased from Abcam (Cambridge, Mass.). The followingsecondary antibodies were also purchased from Abcam: goat anti-mouse IgGCy3 (Cat ID: ab97035), goat anti-rabbit IgG Cy3 (Cat ID: ab6939), goatanti-mouse IgG Cy5 (Cat ID: ab97037), goat anti-rabbit IgG Cy5 (Cat ID:ab6564).

Peptides were printed on microarrays at ArrayIt (Sunnyvale, Calif.).MUC1 peptide was printed at 0.5 mg/ml final concentration and MUC16 at0.1 mg/ml in 1× peptide printing buffer 2 (Cat ID: PEP, ArrayIt) onSuperEpoxy 2 Microarray Substrates on a NanoPrint Microarrayer usingSMP3 Microarray Printing Pin. Printed microarrays were dried overnight.

Prior to use, the slide was blocked for 1.5 hours in Blockit PlusBlocking Buffer (Cat ID: BKTP, ArrayIt) at room temperature with gentleagitation at 350 rpm, followed by washing 3 times 1 minute each with1×PBS in a square petri dish, 30 ml volume at 350 RMP 2 mM orbit. Theslide was spin dried for 10 seconds with the ArrayIt MicroarrayCentrifuge.

Anti-MUC1 and anti-MUC 16 primary antibodies were diluted 250 fold in1×PBS buffer (137 mM NaCl; 2.7 mM KCl; 10 mM Na2HPO4; 2 mM KH2PO4, pH7.4). Secondary antibodies were diluted 10000 fold in 1×PBS. Cy3 or Cy5labeled antibodies were mixed in the 2 pools in 1×PBS: Pool #1:anti-mouse Cy3 and anti-rabbit Cy5; Pool #2: anti-rabbit Cy5 andanti-rabbit Cy3.

5 ml of the mixture of primary probe pools were added to the slide andincubated for 1 hour at room temperature in a container. The slide wasthen washed with 1×PBS, 3 times, 5 minutes each time with gentle shakingat 450 rpm.

Secondary antibody pool #1 was added to the slide and incubated for 1hour at room temperature. The slide was then washed with 1×PBS, 3 times,5 minutes each time with gentle shaking at 450 rpm. The slide wasremoved from the container and dried in the Microarray Centrifuge. Theslide was scanned in Axon 4200A with settings at PMT250 for both 532 and635 Lasers.

The slide was incubated with spots side up in a square petri dishcontaining 5 ml of Stripping Buffer (Cat ID: 21028, Fisher Scientific,Rockford, Ill.) at 300 RMP for 1 hour. The slide was then washed withdistilled water 3 times, for 5 minutes each time. The slide was dried inthe microarray centrifuge and then was prepared for the next cycle ofantibody binding and stripping. The slide was scanned after stripping tomake sure the stripping was efficient.

FIG. 19 illustrates the scanning results: two oligo targets, (MUC1 andMUC16) were identified by probing binding and stripping for 4 cycles.The color sequence of each oligo target was correctly identified.

Example 7 Using Peptides to Demonstrate Single Molecule Counting

Preparation of peptides was performed using the same technique as inExample 4. Peptide MUC1 (20 ng/ml) and MUC 16 (4 ng/ml) were diluted inArrayIt 1×peptide printing buffer 2 (ArrayIt, Sunnyvale, Calif.) andthen incubated on a silicon slide in a container at room temperatureovernight. The slide was then washed with molecule grade water for 5minutes, 3 times, followed by incubation in ArrayIt BlockIt plusblocking solution for 1 hour at room temperature with gentle agitationat 300 rpm. The chip was subsequently washed with molecular grade waterfor 5 minutes, 3 times. The slide was spin dried in microarrayhigh-speed centrifuge. The slide was then built in biochip following thesame procedures as above.

Primary antibodies are diluted 250 fold in 1×PBS. Secondary antibodiesare diluted 10,000 fold in 1×PBS. A mixture of primary antibodiesagainst MUC1 and MUC16 was delivered in the biochip and incubated for 60minutes at room temperature. The chip was then washed with 8× with1×PBS. A mixture of secondary antibodies (either pool #1 containinganti-mouse Cy3 and anti-rabbit Cy5 or pool #2 containing anti-rabbit Cy5and anti-rabbit Cy3) was delivered in the biochip and incubated for 60minutes at room temperature. The chip was then washed with washing 8×with 1×PBS. 15% glycerol in 1×SSPE was added to the chip before imaging.

FIG. 20 shows an image of single molecule peptides, such that conjugatedantibodies (CY5 and CY3) were bound to isolated peptides which in turnwere covalently bonded to the chip surface. Multiple fluors may be boundto a given antibody which creates a spread in the intensity of eachobserved molecule. These molecules were measured to be somewhat brighterthan the DNA single molecule measurements where a single fluor wasconjugated to each DNA probe. A total of six cycles were run of twoproteins in single molecule mode. The molecules were bound and removedwith high yield. Similar techniques can be used for scaling the systemup to many more proteins and DNA/RNA targets.

After imaging, the biochip was rinsed with molecular grade water andthen stripped in the stripping buffer (Cat ID: 21028, Fisher Scientific,Rockford, Ill.) for 1 hour followed by washing 8× with 1×PBS. 15%glycerol in 1×SSPE was added to the biochip before imaging. After cycle1 which includes hybridization probe pool #1, imaging, stripping,imaging, the cycle 2 started with hybridization with probe pool #2.

Example 8 Quantifying the Human Plasma Proteome

A system model was created to demonstrate the feasibility of measuringthe concentration of the ˜10,000 proteins in the human plasma proteomeacross 12 logs of dynamic range using single-molecule identificationwith Reed-Solomon error correction encoding. For this model, theproteins in the plasma proteome were divided into three concentrationregions as shown in FIG. 12B, referred to as low, medium andhigh-concentration regions. Theoretically, the total range of proteinconcentrations in the human plasma proteome is over 10 logs of dynamicrange, but the concentration of each protein does not vary by more thana few logs. FIG. 12B depicts overlapping concentration regions. For eachof the regions, probes are selected for proteins that are expected tofall within that particular region but not exceed the maximumconcentration allowable for that region. Greater overlap can be achievedby adding another concentration range, as shown in FIG. 12C, with thetrade-off being that a greater area of the substrate chip is used.

The data used in the model came from the UniProt database (uniprot.org,FASTA file for organism 9606), “Toward a Human Blood Serum Proteome,”Joshua Adkins et al., “The Human Plasma Proteome,” N. Leigh Anderson etal. and “A High-Confidence Human Plasma Proteome Reference Set withEstimated Concentrations in PeptideAtlas,” T. Farrah et al. Because notall proteins in the UniProt database are associated with a publishedconcentration, random concentrations were assigned without changing thewell-known highly abundant protein concentrations or the overallconcentration. FIG. 21 shows a probability plot of the estimatedconcentrations. Concentrations between 10⁵ and 10¹¹ pg/mL use publishedvalues. All published values found in the lower range were used. Theremaining protein concentrations were estimated using a log normalapproximation, with an estimated Gaussian distribution over a log normalspace of 6 orders of magnitude from 10⁻¹ pg/mL to 10⁴ pg/mL.

FIG. 22 lists specific estimated values for each of the abundanceregions used. For each abundance region, the same target protein density(dTP) was assumed of 1.5 target proteins per μm². However, in the lowabundance region, the number of high abundance region proteins pernumber of target proteins (rHT) was 25,000 to 1, and 250 to 1 in themedium abundance region. The number of high abundance proteins per pixel(rHP) ranged from 1,000 to 0.04. The number of pixels per target proteinwas constant for each of the three regions.

For the model, a four color imaging system is assumed giving 2 bits ofinformation per cycle. FIG. 23 shows what a simulated image would beexpected to look like for any of the colors across any of the abundanceranges, except for the case of albumin in the high abundance range whenat any given time half of the target molecules are expected to be thesame color. A field is the imaging region of the camera for a givenexposure, in this case a 2,000 by 2,000 pixel camera is assumed with 40×magnification giving 163 nm pixels. A total of 2,500 fields (nF) arerequired for the low abundance region, but only 250 fields each arerequired for the high and medium abundance regions.

With the system model optimized, it was determined that the lowestabundance region would interrogate 9,575 proteins out of 9,719 or 98.5%of the proteome. At the other extreme, the high abundance regioninterrogates only the top 2.9% of the proteome. This is because there isonly a small percentage of the proteins in the plasma proteome that makeup the high abundance region. The measurable concentration ranges varydepending on the concentration region. The low abundance concentrationregion measures concentrations between 30 fg/mL and 300 ng/mL. Themedium abundance concentration region measures concentrations between 82pg/mL and 85 ug/mL. The high abundance concentration region measuresconcentrations between 20 ng/mL and 100 mg/mL. The total chip arearequired for this measurement is 320 mm², or a chip with dimensions of18 mm×18 mm.

An analysis was conducted to determine the efficacy of Reed Solomonerror correction in the plasma proteome measurement across 12 logs ofdynamic range. There is an intrinsic error rate for each measurementcycle for each counted molecule. Since each molecule is spread out overa slide (particularly significant for the low abundance molecules),there will be other molecules nearby that should not cross-react withprobes but still do. A robust system will allow for this to occur andwill be able to correct these errors and give the correct identificationof a molecule. The rate at which incorrect binding occurs per moleculeper cycle is the raw error rate. The system error rate is the permolecule identification error rate after correction has been performed.

FIG. 24 illustrates a chart of the expected system error rate vs. theraw error rate for cases varying raw error rates and varying numbers ofimaging cycles. The data for the chart was generated using a Monte Carlosimulation in which a large number of system configurations wassimulated. To identify 16,384 proteins, a total of 7 data cycles arerequired for a four color system (4⁷=16,384). The maximum allowableerror rate is calculated by dividing the allowable number of systemerrors by the number of molecules identified. In this case, for anaverage of one error per protein the allowable number of errors is16,384. The number of molecules identified is 4.0×10⁸ (2,500fields×1.6×10⁵ molecules per field) with an allowable error ratecalculate to be 4.1×10⁻⁵.

Reed Solomon encoding requires parity cycles to improve the raw errorrate. Assuming a Reed Solomon system over a Galois Field of 4 (mm=4),each symbol (or word) is a 4-bit symbol that can be represented by two2-bit symbols (i.e. 2 cycles of a 4 color system that obtains 2 bits percycle). For a Reed Solomon system, the length of the symbol (or codeword) is nn=2^(mm−1), or 15 4-bit symbols or equivalently 30 2-bitsymbols. This means that up to 30 cycles may be processed by afour-color fluidics/imaging system. The number of errors that can becorrected is 3, 4 or 5 per target which corresponds to tt={3, 4, or 5}parity symbols. Four imaging cycles are required per parity cycle. Thisgives a total of 7 data cycles for the ID and 12, 16, or 20 imagingcycles for the error correction. This means that the total number ofcycles required to identify 16,384 simultaneous proteins is 19, 23 and27 cycles for 3, 4 and 5 allowable errors per molecule. As previouslycalculated, the maximum system error rate of 4.1×10⁻⁵ allows one errorper protein. If more errors per protein are allowed, then the maximumsystem error rate can drop proportionately.

FIG. 24 shows the contours of raw error rate vs. system error rate. For19 imaging cycles a maximum raw error rate allowable is 3%, for 23cycles the maximum raw error rate allowable is 6%, and for 27 cycles themaximum raw error rate allowable is 13%. It is expected that the rawerror rate will be less than 5%, although for all but the rarestproteins raw error rates of up to 20% appear to be well within theacceptable range of this technology.

Since the maximum number of cycles allowable is 30 cycles, more datacycles could be included. In particular, if three more data cycles wereincluded, the number of identifiable targets would increase by 4̂3, or64× resulting in a maximum possible identifiable targets of 1,048,576, anumber higher than realistic probe concentrations will allow. However,this illustrates that the technique is scalable to an arbitrarily largenumber of molecules limited only by biology.

SUMMARY

The foregoing description of the embodiments of the invention has beenpresented for the purpose of illustration; it is not intended to beexhaustive or to limit the invention to the precise forms disclosed.Persons skilled in the relevant art can appreciate that manymodifications and variations are possible in light of the abovedisclosure.

Some portions of this description describe the embodiments of theinvention in terms of algorithms and symbolic representations ofoperations on information. These algorithmic descriptions andrepresentations are commonly used by those skilled in the dataprocessing arts to convey the substance of their work effectively toothers skilled in the art. These operations, while describedfunctionally, computationally, or logically, are understood to beimplemented by computer programs or equivalent electrical circuits,microcode, or the like. Furthermore, it has also proven convenient attimes, to refer to these arrangements of operations as modules, withoutloss of generality. The described operations and their associatedmodules may be embodied in software, firmware, hardware, or anycombinations thereof.

Any of the steps, operations, or processes described herein may beperformed or implemented with one or more hardware or software modules,alone or in combination with other devices. In one embodiment, asoftware module is implemented with a computer program productcomprising a computer-readable medium containing computer program code,which can be executed by a computer processor for performing any or allof the steps, operations, or processes described.

Embodiments of the invention may also relate to an apparatus forperforming the operations herein. This apparatus may be speciallyconstructed for the required purposes, and/or it may comprise ageneral-purpose computing device selectively activated or reconfiguredby a computer program stored in the computer. Such a computer programmay be stored in a non-transitory, tangible computer readable storagemedium, or any type of media suitable for storing electronicinstructions, which may be coupled to a computer system bus.Furthermore, any computing systems referred to in the specification mayinclude a single processor or may be architectures employing multipleprocessor designs for increased computing capability.

Embodiments of the invention may also relate to a product that isproduced by a computing process described herein. Such a product maycomprise information resulting from a computing process, where theinformation is stored on a non-transitory, tangible computer readablestorage medium and may include any embodiment of a computer programproduct or other data combination described herein.

Finally, the language used in the specification has been principallyselected for readability and instructional purposes, and it may not havebeen selected to delineate or circumscribe the inventive subject matter.It is therefore intended that the scope of the invention be limited notby this detailed description, but rather by any claims that issue on anapplication based hereon. Accordingly, the disclosure of the embodimentsof the invention is intended to be illustrative, but not limiting, ofthe scope of the invention, which is set forth in the following claims.

All references, issued patents and patent applications cited within thebody of the instant specification are hereby incorporated by referencein their entirety, for all purposes.

What is claimed is:
 1. A method for detecting a plurality of analytes,comprising: obtaining a plurality of ordered probe reagent sets, each ofsaid ordered probe reagent sets comprising one or more probes directedto a defined subset of N distinct target analytes, wherein said Ndistinct target analytes are immobilized on spatially separate regionsof a substrate, and each of said probes is detectably labeled;performing at least M cycles of probe binding and signal detection, eachcycle comprising one or more passes, wherein a pass comprises use of atleast one of said ordered probe reagent sets; detecting from said atleast M cycles a presence or an absence of a plurality of signals fromsaid spatially separate regions of said substrate; and determining fromsaid plurality of signals at least K bits of information per cycle forone or more of said N distinct target analytes, wherein said at least Kbits of information are used to determine L total bits of information,wherein K×M=L bits of information and L≧log₂ (N), and wherein said Lbits of information are used to determine a presence or an absence ofone or more of said N distinct target analytes.
 2. The method of claim1, wherein L>log₂ (N), and wherein L comprises bits of information fortarget identification.
 3. The method of claim 1, wherein L>log₂ (N), andwherein L comprises bits of information that are ordered in apredetermined order.
 4. The method of claim 3, wherein saidpredetermined order is a random order.
 5. The method of claim 1, whereinL>log₂ (N), and wherein L comprises bits of information comprising a keyfor decoding an order of said plurality of ordered probe reagent sets.6. The method of claim 1, further comprising digitizing said pluralityof signals to expand a dynamic range of detection of said plurality ofsignals.
 7. The method of claim 1, wherein said at least K bits ofinformation comprise information about the number of passes in a cycle.8. The method of claim 1, wherein said at least K bits of informationcomprise information about the absence of a signal for one of said Ndistinct target analytes.
 9. The method of claim 1, wherein saiddetectable label is a fluorescent label.
 10. The method of claim 1,wherein said probe comprises an antibody.
 11. The method of claim 10,wherein antibody is conjugated directly to a label.
 12. The method ofclaim 10, wherein said antibody is bound to a secondary antibodyconjugated to a label.
 13. The method of claim 1, wherein said probecomprises an aptamer.
 14. The method of claim 13, wherein said aptamercomprises a homopolymeric base region.
 15. The method of claim 1,wherein said plurality of analytes comprises a protein, a peptideaptamer, or a nucleic acid molecule.
 16. The method of claim 1, whereinsaid detecting from said at least M cycles a presence or an absence of aplurality of signals comprises optically detecting said plurality ofsignals.
 17. The method of claim 1, wherein said detecting from said atleast M cycles a presence or an absence of a plurality of signalscomprises electrically detecting said plurality of signals.
 18. Themethod of claim 1, wherein said method is computer implemented.
 19. Themethod of claim 1, wherein K is one bit of information per cycle. 20.The method of claim 19, wherein K is two bits of information per cycle.21. The method of claim 20, wherein K is three or more bits ofinformation per cycle.
 22. The method of claim 1, further comprisingdetermining from said L bits of information an error correction for saidplurality of output signals.
 23. The method of claim 22, wherein saiderror correction comprises using a Reed-Solomon code.
 24. The method ofclaim 1, further comprising determining a number of ordered probereagent sets based on the number of N distinct target analytes.
 25. Themethod of claim 1, further comprising determining a type of probereagent sets based on the type of N distinct target analytes.
 26. Themethod of claim 1, wherein said N distinct target analytes are presentin a sample, and wherein the sample is divided into a plurality ofaliquots that are diluted to a plurality of distinct final dilutions,each of said plurality of aliquots being immobilized onto a distinctsection of the substrate.
 27. The method of claim 26, wherein one of thedistinct final dilutions is determined based on a probablenaturally-occurring concentration of at least one of the N distincttarget analytes.
 28. The method of claim 26, wherein a concentration ofone of the N distinct target analytes is determined by counting theoccurrences of the target analyte within one of the distinct sectionsand adjusting the count according to the dilution of the respectivealiquot.
 29. A kit for detecting a plurality of analytes, comprising: aplurality of ordered probe reagent sets, each of said ordered probereagent sets comprising one or more probes directed to a defined subsetof N distinct target analytes, wherein said N distinct target analytesare immobilized on spatially separate regions of a substrate, and eachof said probes is detectably labeled; instructions for detecting said Ndistinct analytes based on a plurality of detectable signals, saidinstructions comprising: instructions for performing at least M cyclesof probe binding and signal detection, each cycle comprising one or morepasses, wherein a pass comprises use of at least one of said orderedprobe reagent sets; instructions for detecting from said at least Mcycles a presence or an absence of a plurality of signals from saidspatially separate regions of said substrate; and instructions fordetermining from said plurality of signals at least K bits ofinformation per cycle for one or more of said N distinct targetanalytes, wherein said at least K bits of information are used todetermine L total bits of information, wherein K×M=L bits of informationand L≧log₂ (N), and wherein said L bits of information are used todetermine a presence or an absence of one or more of said N distincttarget analytes.
 30. The kit of claim 29, wherein said one or moreprobes comprises an antibody.
 31. The kit of claim 29, wherein saidlabel is a fluorescent label.
 32. The kit of claim 29, wherein saidprobe comprises an antibody.
 33. The kit of claim 32, wherein saidantibody is conjugated directly to a label.
 34. The kit of claim 32,wherein said antibody is bound to a secondary antibody conjugated to alabel.
 35. The kit of claim 29, wherein said probe comprises an aptamer.36. The kit of claim 35, wherein said aptamer comprises a homopolymericbase region.
 37. The kit of claim 29, wherein said plurality of analytescomprises a protein, a peptide aptamer, or a nucleic acid molecule. 38.The kit of claim 29, wherein L>log₂ (N).
 39. The kit of claim 29,wherein M≦N.
 40. The kit of claim 29, further comprising instructionsfor determining an identification of each of said N distinct targetanalytes using said L bits of information, wherein L comprises bits ofinformation for target identification.
 41. The kit of claim 29, furthercomprising instructions for determining an order of said plurality ofordered probe reagent sets using said L bits of information, wherein Lcomprises bits of information that are ordered in a predetermined order.42. The kit of claim 41, wherein said predetermined order is a randomorder.
 43. The kit of claim 29, further comprising instructions forusing a key for decoding an order of said plurality of ordered probereagent sets.